Design infrastructure and automated systems to support our distributed architecture
Build and Manage CI/CD pipelines and constantly improve their reliability & speed, and reduce lead time for changes
Forecast and plan for the infrastructure needs of a fast-growing SaaS company and find ways to improve efficiency and cost
Trace performance bottlenecks and identify optimizations and improvements at both the infrastructure and application level
Collaborate with our engineering team to meet high SLO and SLA requirements from customers
Maintain highly available web and backend systems that serve millions of users, and 1000’s of requests per second
Closely collaborating with Developers to setup, configure and plan the necessary cloud services in support of new feature development on AWS
Analyze system performance and capacity and plan for future growth
Securing our infrastructure at both the cloud layer (IAM) and application layer (PKI)
Building and expanding monitoring and alerting systems for both infrastructure and business operations, using internal tools & integrating into established 3rd party SaaS ones
Establishing comprehensive infrastructure-as-code coverage to support our entire platform
Develop tools to enhance and support Developer Productivity
Champion automation of manual processes and reducing operational overhead
Requirements
3+ years DevOps or software development experience
3+ years of experience building, maintaining and scaling database technologies such as Postgresql, MySQL, Redis, and DynamoDB
2+ years experience orchestrating large scale distributed microservice deployments on Kubernetes and EC2
2+ years experience building and managing EKS clusters and strong knowledge of the K8s ecosystem
2+ years of experience with Prometheus/Grafana/Cloudwatch metrics monitoring, ELK/OpenSearch stack for logging and PagerDuty and alerting
Tech Stack
AWS
Cloud
DynamoDB
EC2
Grafana
Kubernetes
MySQL
Postgres
Prometheus
Redis
Benefits
Workday Bonus Plan or role-specific commission/bonus