Design, deploy, and manage production Kubernetes clusters, including workload scheduling, resource quotas, network policies, ingress/egress controls, and RBAC
Develop and maintain deployment patterns using Helm, Kustomize, and Kubernetes operators
Build and optimize CI/CD pipelines using Infrastructure as Code and GitOps principles
Provision and manage infrastructure using tools such as Terraform or similar automation platforms
Build secure container images and manage image versioning and promotion across environments
Implement observability solutions using Prometheus, Grafana, OpenTelemetry, or similar technologies
Conduct load testing, resilience validation, performance tuning, and capacity planning
Perform Linux administration, troubleshooting, and system hardening
Support stateful workloads including PostgreSQL, MySQL, Redis, and Kafka
Participate in incident response, root cause analysis, and reliability improvement initiatives
Document architecture, operational procedures, backup and recovery processes, and infrastructure lifecycle plans
Requirements
2–5+ years of experience in Cloud Infrastructure, DevOps, or SRE roles
Hands-on experience managing Kubernetes clusters in production environments
Experience building and maintaining CI/CD pipelines
Experience working with Infrastructure as Code tools such as Terraform
Strong containerization knowledge and security best practices
Solid Linux administration and troubleshooting skills
Proficiency in Python or a similar scripting language
Understanding of networking, Kubernetes security, and deployment strategies
Experience with load testing and performance analysis
Ability to work effectively across technical teams in a collaborative environment