Design, build, and evolve highly reliable software systems and platforms that support large-scale, distributed services.
Lead the development of GitOps-based workflows and deployment automation.
Configure and operate services on Kubernetes-based platforms using standardized deployment and configuration patterns.
Build and maintain infrastructure-as-code using tools such as Terraform to provision and manage cloud resources on AWS.
Apply AI/ML techniques and tools to enhance the software delivery lifecycle, improve engineering productivity, and streamline operational workflows.
Apply SRE principles to the design and operation of software systems, emphasizing reliability, scalability, risk management, and operational efficiency.
Partner with application and platform teams to standardize patterns for service configuration, rollout strategies, and environment management.
Identify and reduce systemic sources of risk, toil, and inefficiency through thoughtful engineering and automation.
Contribute technical leadership through design and code reviews and mentoring, helping raise engineering quality across the organization.
Requirements
A degree in Computer Science, Engineering, or equivalent experience is required.
10+ years of professional software engineering experience, with a strong emphasis on distributed systems and cloud-native architectures.
Deep experience with GitOps alongside equivalent experience, CI/CD pipelines, and software delivery systems at scale.
Hands-on and extensive experience working with Kubernetes-based platforms.
Demonstrated use of AI/ML or intelligent automation to improve engineering productivity, reliability, or operational efficiency.
Strong expertise with cloud platforms such as AWS and infrastructure-as-code practices.
Strong programming skills in one or more languages commonly used in platform engineering (e.g., JavaScript/TypeScript, Go, Python, or similar).
Demonstrated experience applying Site Reliability Engineering practices to the design, delivery, and operation of large-scale production systems.
Ability to work effectively across teams, communicate complex technical concepts clearly, and influence engineering direction through technical leadership.
A continuous improvement outlook, with a passion for building systems that make other engineers more effective.