Design, implement, and maintain cloud infrastructure on AWS using Terraform and Ansible, following existing conventions and extending them thoughtfully.
Manage and support AWS services across our stack including EC2, ECS, RDS, S3, IAM, VPC, CloudFront, and related services.
Maintain and improve infrastructure-as-code practices, ensuring consistency, reproducibility, and auditability across environments.
Participate in capacity planning and cost optimization, identifying opportunities to improve resource efficiency without compromising reliability.
Build, maintain, and improve CI/CD pipelines (GitHub Actions or equivalent) to support reliable, automated delivery across development, staging, and production environments.
Work with engineering teams to improve build speed, deployment safety, and rollback capabilities.
Support blue/green and canary deployment strategies as appropriate for our platform needs.
Participate in on-call rotation and own production incidents end-to-end — from detection through root cause analysis, resolution, and post-mortem.
Use observability tooling (Datadog, CloudWatch, or equivalent) to monitor system health, establish alerting thresholds, and proactively surface issues before they impact customers.
Contribute to runbooks, incident documentation, and process improvements that reduce mean time to resolution over time.
Apply security best practices across infrastructure — IAM policy scoping, secrets management, network segmentation, vulnerability patching, and access controls.
Support compliance and audit requirements by maintaining clear documentation and ensuring infrastructure changes are tracked and reviewable.
Work closely with the senior engineer on the team to learn existing systems deeply and contribute to architectural improvements over time.
Proactively identify areas for improvement — tooling, automation gaps, manual processes, reliability risks — and raise them constructively with the team.
Document infrastructure clearly so that other engineers can understand and operate the systems they depend on.
Requirements
8+ years of professional DevOps, infrastructure, or platform engineering experience in production environments.
Hands-on proficiency with Terraform for infrastructure provisioning — writing modules, managing state, and working across environments.
Deep familiarity with AWS — including compute (EC2, ECS), storage (S3, RDS), networking (VPC, Route 53, CloudFront), and IAM.
Experience with Ansible for configuration management and automation across server fleets or container environments.
Strong understanding of CI/CD principles and hands-on experience building or maintaining pipelines (GitHub Actions, GitLab CI, CircleCI, or equivalent).
Experience with Linux system administration, shell scripting (Bash), and general infrastructure debugging.
Demonstrated ability to work within an established infrastructure — understanding existing design decisions, following conventions, and improving incrementally rather than replacing wholesale.
Solid grasp of security fundamentals: IAM least-privilege, secrets management, network access controls, and patching hygiene.
Strong written and verbal communication skills in English — able to collaborate asynchronously across time zones and document work clearly.
BSc in Computer Science, Engineering, or a related field — or equivalent professional experience.