Must-haves:
Candidate should have sound knowledge in Observability & Monitoring (Splunk, Grafana, Hubble)
Candidate should be proficient in Scrum Facilitation & Agile Delivery
Candidate should have proficiency in Shell Script Automation
Candidate should have proficiency in CI/CD & DevOps Tools
Experience in SQL & Database Reporting
Role Descriptions:
We are looking for a highly skilled Site Reliability Engineer (SRE) to design| build| and maintain reliable| scalable| and high-performance systems.
The ideal candidate has strong hands-on experience with AWS cloud services| Kubernetes| Docker| DevOps practices| and Python| and is passionate about automation| system reliability| and operational excellence.
Key Responsibilities
Design| deploy| and maintain scalable and highly available infrastructure on AWS
Manage containerized applications using Docker and orchestrate them with Kubernetes
Build and maintain CICD pipelines to automate build| test| and deployment processes
Implement infrastructure as code (IaC) using tools like Terraform or CloudFormation
Develop automation and internal tools using Python
Monitor system performance| availability| and reliability proactively identify and resolve issues
Define and track SLIs| SLOs| and SLAsParticipate in on-call rotations| incident response| and post-incident reviews
Collaborate closely with development| security| and operations teams
Improve system resilience| fault tolerance| and disaster recovery strategies
Ensure best practices for security| cost optimization| and performance tuning
Skills: Digital : Docker~Digital : Kubernetes~Digital : Site Reliability Engineering (SRE)
Experience Required: 6-8