Adroitts is seeking a Lead Site Reliability Engineer – Cloud Platform (GCP/Kubernetes) with 12+ years of experience. The role involves managing cloud infrastructure, ensuring system reliability, and collaborating with teams to optimize solutions.
Responsibilities:
- Managing and scaling cloud infrastructure
- Ensuring the reliability and performance of systems
- Automating processes
- Enabling effective troubleshooting to minimize downtime
- Collaborating with cross-functional teams to develop and optimize solutions
- Adhering to industry best practices for reliability and performance
Requirements:
- Google Cloud Platform (GCP)
- GKE (Google Kubernetes Engine)
- Kubernetes Architecture
- Terraform
- Helm
- Multi-Cloud (AWS/Azure)
- Platform Reliability & Scalability
- Prometheus
- Python/Bash Scripting
- 12+ years of experience
- Senior/SME-level SRE candidates only
- Strong expertise in Site Reliability Engineering, including designing and maintaining reliable, scalable, and high-performing systems
- Proficient in root cause analysis, debugging, and advanced Troubleshooting of complex technical issues
- Comprehensive experience in Software Development with a focus on automation, scripting, and implementing best coding practices
- Proficiency in System Administration and Infrastructure management, including cloud environments like GCP and container orchestration tools such as Kubernetes
- Experience with monitoring and alerting tools, CI/CD pipelines, and other DevOps methodologies
- Excellent problem-solving skills, attention to detail, and ability to work collaboratively in a team-oriented environment
- Proficiency in cloud computing technologies with a focus on GCP and experience in Kubernetes and containerization
- Bachelor's degree in Computer Science, Engineering, or a related field
- Experience working on large-scale systems, with knowledge of best practices for high availability and disaster recovery strategies
- advanced certifications in cloud platforms or SRE principles