Deploy and operate production-grade Kubernetes clusters (GKE + on-prem k8s)
Manage all infrastructure as code using Terraform (mandatory) + Helm + Kustomize
Configuration management with Ansible (existing playbooks) while evolving to more modern practices
Ensure high availability and disaster recovery for databases and queues (Cloud SQL, AlloyDB, Memorystore Redis, managed Kafka/RabbitMQ, Elasticsearch/OpenSearch on GKE)
Build a modern observability stack: Prometheus + Grafana + Loki/Tempo + OpenTelemetry , integrated with Cloud Monitoring and Cloud Logging