Architect and execute a comprehensive overhaul of our existing CI/CD pipelines (leveraging tools like GitHub Actions/Pipelines) to enhance speed, reliability, and security, preparing our delivery process for future growth.
Proactively identify, diagnose, and resolve complex infrastructure and application issues (performance, availability, scaling) across our production environments (primarily AWS, with exposure to GCP).
Design and manage infrastructure as code (IaC) using Terraform to provision and maintain resilient, scalable, and cost-effective cloud resources.
Collaborate closely with software engineering teams to embed SRE principles, optimize application performance, and eradicate developer friction points and bottlenecks.
Document technical systems, processes, and workflows thoroughly to ensure clarity, maintainability, and operational consistency.
Requirements
1+ years of experience in DevOps, Infrastructure Engineering, Site Reliability Engineering (SRE), or a related discipline.
Demonstrated experience using modern AI-assisted development tools (e.g., GitHub Copilot, ChatGPT, Claude, or similar) as part of daily engineering workflows.
Proven expertise in designing, building, and maintaining robust, scalable, and secure CI/CD pipelines (e.g., GitHub, Jenkins, GitLab).
Experience managing production workloads primarily in AWS. Knowledge of GCP is a plus.
Proficiency with Terraform or similar tools for declarative infrastructure management.
Highly skilled in performance tuning, troubleshooting, and root cause analysis across complex cloud platforms.
Exceptional communication skills—verbal and written—with the ability to clearly articulate technical strategy, trade-offs, and complex issues to both technical and non-technical audiences. A proven ability to document complex systems and workflows is required.
Strong understanding of cloud infrastructure best practices, with a focus on reliability and cost optimization. Experience working in a regulated environment (e.g., healthcare, finance) is beneficial.
A proactive approach to identifying system weaknesses and driving projects from concept to production with minimal supervision.
Tech Stack
AWS
Cloud
Google Cloud Platform
Jenkins
Terraform
Benefits
Flexible work hours and flexible paid time off
Work remotely
Generous parental leave
Comprehensive healthcare, vision and dental benefits