Maven AGI is an enterprise AI platform founded in July 2023 by executives from HubSpot, Google, and Stripe. They are seeking a Senior DevOps Engineer to own and evolve the infrastructure powering their AI platform, focusing on designing, building, and operating production systems across cloud providers and ensuring platform reliability as they onboard enterprise customers.
Responsibilities:
- Design, implement, and maintain cloud infrastructure (Azure, AWS) using infrastructure-as-code (Pulumi, Bicep, Terraform)
- Own Kubernetes cluster operations: deployments, scaling, monitoring, and incident response
- Build and optimize CI/CD pipelines for a large-scale monorepo
- Implement observability across services (metrics, logging, tracing, alerting)
- Drive reliability practices: SLOs, capacity planning, disaster recovery, and runbook development
- Collaborate with engineering teams to improve developer experience and deployment velocity
- Manage secrets, access controls, and infrastructure security posture
- Evaluate and adopt new tooling to reduce operational toil
Requirements:
- 3-7 years of professional DevOps/SRE/Infrastructure experience
- Deep expertise with Kubernetes in production (AKS, EKS, or GKE)
- Strong infrastructure-as-code skills (Pulumi, Terraform, or Bicep)
- Experience operating CI/CD systems (GitHub Actions, ArgoCD, or Jenkins)
- Proficiency in at least one scripting/programming language (Python, Go, TypeScript, or Bash)
- Solid understanding of IaaS providers, networking, DNS, load balancing, and TLS
- Experience with monitoring and observability stacks (Datadog, Prometheus, Grafana, or similar)
- Strong communication and cross-team collaboration skills
- Organized, great attention to detail, comfortable operating in a ticketing environment
- Thrives in fast-paced startup environments
- Experience with GPU infrastructure and ML/LLM serving workloads (vLLM, TEI)
- Familiarity with Temporal or other workflow orchestration systems
- Security and compliance background (SOC 2, HIPAA, GDPR)
- Experience with multi-cloud or hybrid (cloud + on-prem) deployments
- Cost optimization experience at scale