Role Overview
- Provision and manage cloud infrastructure across compute, networking, storage, and managed databases
- Maintain consistency across development, staging, and production environments
- Plan and execute workload migrations across cloud providers with minimal disruption
- Identify and implement infrastructure cost optimization opportunities
- Build and maintain CI/CD pipelines across a multi-repository environment
- Implement deployment strategies such as blue/green, canary, and rolling deployments
- Develop automation tooling and scripts to reduce manual effort and improve delivery speed
- Write and maintain Terraform infrastructure across multi-account, multi-environment systems
- Establish IaC standards, module reuse patterns, and state management practices
- Drive infrastructure changes through version-controlled, peer-reviewed workflows
- Participate in on-call rotation and lead incident response efforts
- Maintain observability through dashboards, alerting, logging, and SLO tracking
- Conduct post-incident reviews and implement long-term remediations
- Manage secrets, certificates, and access controls using least-privilege principles
- Apply security best practices across infrastructure, containers, and CI/CD pipelines
- Support audit and compliance initiatives through documentation and access reviews
- Partner with engineering teams on infrastructure guidance and service design
- Create architecture documentation, runbooks, and operational guides
- Help build self-service platform capabilities for engineering teams
Requirements
- 5+ years of experience in Platform Engineering, DevOps, or Site Reliability Engineering
- Strong AWS experience with EC2, EKS/ECS, VPC, Route 53, IAM, S3, EBS, and CloudWatch
- Deep Terraform experience across multi-environment configurations
- Experience with Kubernetes, ECS, Docker, ArgoCD, and Helm in production environments
- Experience designing and maintaining CI/CD pipelines with GitHub Actions or similar tools
- Strong observability and monitoring experience (Datadog, Prometheus/Grafana, New Relic, etc.)
- Solid networking fundamentals including DNS, TLS, VPNs, load balancing, and VPC design
- Experience as a network engineer (preferred)
- Experience with postgres and mysql compatible database administration (preferred)
- Experience working with Google/Azure cloud (preferred)
- Kubernetes, AWS or Terraform certification (preferred)
Tech Stack
- AWS
- Azure
- Cloud
- DNS
- Docker
- EC2
- Grafana
- Kubernetes
- MySQL
- Postgres
- Prometheus
- Terraform
Benefits
Health insurance, dental, vision for your family, 401K, paid time off, sick leave, parental leave, and more.