Design, deploy, and operate Kubernetes clusters (EKS or self-managed) on AWS, ensuring high availability and security
Build and maintain GitHub Workflows and internal developer tooling to improve engineering velocity
Automate infrastructure provisioning and operational tasks using Python and tools like Terraform, OpenTofu, and Helm
Define and enforce platform standards around observability, cost management, resource scaling, and proactive incident management
Partner with application teams to support containerized workloads and resolve infrastructure bottlenecks
Collaborate with Customer Success teams by providing reliable and scalable tooling that supports seamless customer onboarding, integrations, and service delivery
Proficiency in Python or similar for scripting, automation, and building internal tools
Familiarity with infrastructure-as-code practices (Terraform, OpenTofu, and Helm)
A collaborative mindset and comfort working in a fast-moving environment
Familiarity of multi-account AWS strategies, AWS Organizations, and landing zone patterns for enterprise-scale environments
Experience with multi-tenancy patterns
Experience with service meshes (Istio) for managing microservice communication, traffic policies, and mutual TLS
GitOps workflows using ArgoCD or Flux for declarative, version-controlled infrastructure and application delivery
Exposure to container security tooling such as Falcon, Grype/Syft, or similar and OPA or Kyverno for policy enforcement and vulnerability scanning
Experience with observability stacks like Prometheus, Grafana, or the ELK/OpenSearch stack for metrics, logging, and distributed tracing across multiple Kubernetes Clusters
Strong knowledge of integrating Kubernetes with AWS Services (e.g. vpc-cni, external-secrets, ALB Ingress, Security Groups, etc)