Scalence L.L.C. is a technology company that develops environments, applications, and tools for clients. They are seeking a dedicated Site Reliability Engineer to bridge development and operations, ensuring systems are scalable, reliable, and efficient through automation and management of complex deployments.
Responsibilities:
- Write and maintain robust Shell scripts to automate system tasks
- Develop and enhance features within our core application using Python
- Take ownership of the production environment by managing deployments using Spinnaker
- Ensure that all code moving to production is stable and performant
- Conduct thorough code reviews for both infrastructure-as-code and application logic to maintain high standards of reliability and security
- Manage and provision cloud infrastructure using Pulumi
- Maintain and scale our containerized workloads using Kubernetes and Docker
- Utilize Git and GitHub for collaborative development, branching strategies, and CI/CD integration
Requirements:
- 5+ years in an SRE, DevOps, or Systems Engineering role
- Proficiency in Shell scripting for system automation is a must
- Strong Python programming skills with the ability to contribute to application-level codebases
- Deep understanding of containerization (Docker) and orchestration (Kubernetes)
- Experience performing code reviews and managing high-stakes deployments in production environments
- A data-driven approach to troubleshooting and system performance optimization