Design, build, and maintain production infrastructure on AWS (EKS, RDS, ECR, VPC, IAM, Secrets Manager, etc.).
Develop and manage our Kubernetes clusters: deploy workloads, tune Karpenter node autoscaling, maintain Helm charts, and keep clusters healthy.
Own and extend our GitOps deployment pipeline: GitHub Actions for CI/CD, ArgoCD for continuous delivery, and Helm for packaging.
Manage supporting cluster operators including Envoy Gateway, External DNS, cert-manager, Fluent Bit, and the AWS Load Balancer Controller.
Own and improve our observability stack—Grafana for dashboards, Loki for log aggregation, Tempo for distributed tracing, and Prometheus for metrics.
Support multi-environment reliability across dev, stage, and production GovCloud accounts.
Improve system resilience through load testing (Locust), E2E testing (Playwright/Cucumber), and thoughtful capacity planning.
Contribute to backend services (FastAPI, SQLAlchemy) in Python and TypeScript.
Work alongside product engineers as a first-class contributor, making architecture decisions that balance speed, cost, and reliability.
Build developer experience tooling: local dev environments, CI pipeline improvements, and automated testing scaffolds that make the whole team faster.
Support and extend Temporal-based workflow orchestration for background processing.
Implement least-privilege IAM policies, IRSA (IAM Roles for Service Accounts), and network segmentation in a GovCloud environment.
Manage secrets through AWS Secrets Manager and the External Secrets Operator with automated rotation.
Maintain TLS automation via cert-manager and OIDC authentication flows.
Enable SOC2, CMMC, and FedRAMP compliance activities: GRC platform integration, audit logging pipelines, FIPS-validated endpoint configuration, system boundary documentation, and evidence collection for third-party assessments.
Requirements
5+ years of professional experience in infrastructure, DevOps, SRE, or platform engineering.
Deep hands-on experience with AWS services in production (EKS, IAM, Secrets Manager, ECR, RDS). Experience with or strong working knowledge of AWS GovCloud is a significant plus.
Comfort with Linux systems administration and shell scripting.
Familiarity with compliance-driven infrastructure: audit logging, access controls, and evidence collection for frameworks like CMMC, FedRAMP, and SOC 2.
A collaborative, low-ego mindset: you thrive in small, fast-moving teams.
Nice to Have: Experience with Envoy Gateway or the Kubernetes Gateway API.
Background in PostgreSQL administration and schema-based multi-tenancy.
Familiarity with the Grafana observability stack (Loki, Tempo, Prometheus).
Experience with Karpenter for node autoscaling or cost optimization strategies for cloud spend.
Experience with Temporal for workflow orchestration.
Experience in a startup or high-growth environment where you wore many hats.