Express your passion about infrastructure as code and continuous deployment to build scalable and highly reliable systems.
Define and own KPIs around system availability, quality and scale.
Partner with our developers and quality engineering teams to automate the monitoring, alerting, availability and scalability of our applications and systems.
Ensure system availability and business continuity by implementing redundant servers/services.
Manage after-hours infrastructure updates and maintenance.
Proactively research and propose the use of new concepts, processes, technologies, and tools.
Partner with software developers to create Mist standards for Microservices (APIs, schemas, serialization, data stores and best practices).
Run secure and scalable applications for highly available, multi-region, AWS and GCP deployments.
Ship code several times per week.
Be a part of our On-Call rotation.
Own disaster recovery and business continuity plans.

An extensive background in developing and operating large-scale cloud-based distributed applications.
Direct experience developing/running applications on AWS or Google Cloud.
Laser focus and be able to design infrastructure solutions for scalability, reliability, high availability, performance, security, software maintainability, and operational excellence.
The ability to "fix the plane while in flight" (not just support greenfield solutions).
The ability to prioritize existing technical and infrastructure debt, and experience to build and execute a plan to pay it off.
Delivering web-scale infrastructure for a global market at high release velocity.
A deep understanding of distributed system design and dependency management.
Must have solid experience with at least 2 of the languages: Go, Java, Python.
10+ years industry experience in managing infrastructure.
5 years Kubernetes administration in a large-scale SaaS environment.
5 years maintaining production systems on AWS or GCP.
3 years in implementing, managing, and monitoring metrics specific to SaaS applications.
3 years using infrastructure as code software (eg. Terraform, AWS and Google Cloud Deployment, CloudFormation).
5 years’ experience in continuous integration practices & tools (Jenkins, Travis CI, CircleCI, etc…).

Health & Wellbeing: We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.
Personal & Professional Development: We also invest in your career because the better you are, the better we all are. We have specific programs catered to helping you reach any career goals you have — whether you want to become a knowledge expert in your field or apply your skills to another division.
Unconditional Inclusion: We are unconditionally inclusive in the way we work and celebrate individual uniqueness. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good.

Site Reliability Engineer – DevOps

Key skills