Architect and evolve our DevOps ecosystem, champion cloud cost governance, and implement best-in-class container orchestration practices.
Work cross-functionally with engineering, security, and finance teams to ensure operational excellence while proactively managing infrastructure spend.
Lead end-to-end DevOps strategy, including CI/CD pipelines, automation, infrastructure-as-code, and release engineering.
Design scalable, resilient cloud-native architectures aligned with business growth.
Establish DevOps best practices, reliability standards, and operational governance.
Architect and manage large-scale Kubernetes environments for production workloads.
Optimize workloads across clusters for performance, reliability, and cost efficiency.
Build and maintain containerized applications using Docker and Kubernetes, ensuring portability and scalability.
Drive multi-cluster, multi-region deployments where necessary.
Own infrastructure cost visibility and optimization initiatives.
Implement cloud cost-saving strategies including rightsizing, reserved capacity planning, auto-scaling optimization, and workload scheduling.
Create dashboards and reporting mechanisms to track infrastructure ROI and spend trends.
Continuously identify inefficiencies and implement measurable cost-reduction initiatives without compromising performance.
Design and implement comprehensive monitoring systems using Grafana and related observability tools.
Build real-time dashboards for system health, performance metrics, and cost insights.
Establish alerting frameworks to minimize downtime and improve incident response.
Drive improvements in system reliability through data-driven monitoring and post-incident analysis.
Automate provisioning, deployments, scaling, and recovery processes.
Improve system resilience, availability, and disaster recovery strategies.
Requirements
9–15 years of experience in DevOps, Site Reliability Engineering, or Cloud Infrastructure roles.
Deep expertise in Kubernetes, container orchestration, and production-grade Docker and Kubernetes implementations.
Strong hands-on experience with Grafana, monitoring systems, and observability frameworks.
Proven track record in cost savings initiatives and infrastructure cost planning in cloud environments.
Experience designing highly available, scalable systems in AWS, Azure, or GCP.
Strong understanding of Infrastructure-as-Code (Terraform, CloudFormation, etc.).
Expertise in CI/CD automation and release management.
Solid knowledge of networking, security best practices, and cloud architecture patterns.