AWSAzureCloudDockerGoogle Cloud PlatformJenkinsKubernetesTerraformAIGCPGoogle CloudGitHub ActionsGitLab CIPulumiCloudFormationAzure DevOpsGitHubGitLabCI/CDLeadershipRemote Work
About this role
Role Overview
Design, build, and maintain multi-cloud Infrastructure as Code (AWS, Azure; GCP desirable), ensuring scalable, secure, high-performing, and cost-efficient environments across development, staging, and production.
Architect and lead cloud migrations, platform modernization initiatives, and greenfield infrastructure builds, owning environment deployment, scaling, reliability, and continuous improvement of CI/CD pipelines and automation standards.
Establish and enforce deployment best practices, operational standards, and DevOps processes to ensure consistent, repeatable, and resilient infrastructure delivery.
Troubleshoot complex infrastructure and deployment challenges, provide thorough root cause analysis, implement long-term fixes, and support on-call ownership, escalation management, and incident response activities.
Play an active role in post-mortems and reliability reviews, embedding lessons learned into architecture, processes, and automation to continuously strengthen platform stability.
Design, document, and regularly test Disaster Recovery (DR) and Business Continuity (BC) strategies with clearly defined RPO/RTO targets across regions, ensuring preparedness for component and regional failures.
Lead the evolution from highly available systems to fault-tolerant architectures capable of withstanding infrastructure or regional disruptions without service impact.
Introduce and operationalize chaos engineering and resilience testing practices (e.g., fault injection) to proactively identify systemic weaknesses before they affect production.
Advance observability practices beyond dashboards by implementing predictive monitoring, anomaly detection, intelligent alerting, and AI-assisted operational tooling (AIOps) for log analysis and automated triage or remediation.
Leverage automation and AI to streamline on-call workflows, reduce operational overhead, improve alert quality, and accelerate post-incident reporting and RCA documentation.
Build and maintain modular, reusable IaC patterns (Terraform, Bicep, Pulumi) that support multi-cloud portability, rapid environment replication, and strong developer self-service capabilities with appropriate security and governance guardrails.
Take shared ownership of cloud cost optimization through tagging strategies, automated scaling policies, and improved visibility into multi-cloud spend.
Define and document architecture standards, operational runbooks, and infrastructure best practices, collaborating closely with Engineering, Security, IT, and Product stakeholders to align platform capabilities with business objectives.
Mentor and upskill IT and Engineering team members in cloud architecture, DevOps, resilience engineering, and operational excellence, acting as a trusted technical authority who translates infrastructure risk and strategy into clear business impact for senior leadership.
Requirements
Bring 10+ years of experience in complex, production-grade DevOps environments
Have strong hands-on mastery of IaC tooling (Terraform, CloudFormation, ARM, Bicep, or Pulumi)
Are fluent in AWS and Azure, with a track record of leading cloud migrations or major re-architecture projects
Excel in CI/CD automation using tools like GitHub Actions, GitLab CI, Jenkins, or Azure DevOps
Have strong knowledge of Docker, Kubernetes, networking, security, observability, and live platform support
Can design, implement, and maintain Disaster Recovery and Business Continuity strategies that support mission-critical systems
Communicate clearly and confidently, translating technical risk into business language for stakeholders.
Tech Stack
AWS
Azure
Cloud
Docker
Google Cloud Platform
Jenkins
Kubernetes
Terraform
Benefits
Flexible Work Style: Enjoy remote work, flexible PTO, and a healthy work-life balance.
Career Growth: Access mentorship, development programs, and clear pathways for advancement.
Competitive Compensation & Benefits: Market-leading pay, bonuses, and comprehensive medical and retirement plans.
Recognition & Support: Be seen and celebrated through regular recognition programs and supportive leadership.