Design end-to-end reliability and automation solutions to solve organization-wide challenges involving multiple teams and stakeholders.
Lead cloud infrastructure automation, orchestration, and configuration management using infrastructure-as-code principles.
Define and implement SRE best practices related to availability, scalability, monitoring, and incident response.
Proactively monitor alerts and lead troubleshooting of complex issues related to cloud infrastructure and automation systems.
Implement security automation practices to enforce compliance, monitor security events, and automate responses to security threats.
Partner with engineering and operations teams to reduce operational toil and improve overall service reliability.
Maintain comprehensive documentation of automated processes, configurations, and troubleshooting procedures.
Stay up to date with industry best practices, emerging technologies, and cloud service provider updates to continuously improve automation and reliability solutions.
Influence vendors and open-source projects to address reliability and automation gaps not covered by existing technologies.
Requirements
8+ years of experience in cloud platforms (GCP), designing, operating, and optimizing large-scale, enterprise cloud environments.
8+ years of experience in software engineering and programming, with strong proficiency in Python for automation, tooling, and operational workflows.
8+ years of experience in automation tools and frameworks, building scalable automation solutions to reduce operational toil and improve system reliability.
8+ years of experience in DevSecOps, CI/CD practices, and infrastructure as code, implementing secure, repeatable, and reliable deployment pipelines.
8+ years of experience in cloud security and compliance, including security automation, monitoring, and automated incident response.
Experience defining and managing SLOs, SLIs, and error budgets.
Exposure to large-scale, multi-team or enterprise cloud environments.
Contributions to open-source projects or vendor platforms.
Advanced experience with observability, logging, and monitoring tools.
Technical leadership or mentoring experience within SRE or DevOps teams.
Bachelor’s degree in computer science, Information Technology, or related field.