Braintrust is the AI observability platform that connects evals and observability to improve AI performance in production. They are seeking a Cloud Infrastructure Engineer to build reliable infrastructure and enhance the deployment experience for developers and customers across various cloud platforms.
Responsibilities:
- Build and maintain Terraform modules for both internal infrastructure and customer deployments
- Work directly with customers in Slack to support self-hosting and troubleshoot infrastructure issues
- Build tools to make it easier for them to support themselves
- Own and improve our CI/CD pipeline: reduce build times, improve failure visibility, and enable safer, faster releases
- Centralize and scale observability - including logs, metrics, dashboards, and alerts
- Partner with engineering teams to build and evolve a secure, developer-friendly infrastructure platform
- Support multi-cloud deployment patterns (AWS primarily, with Azure and GCP support for enterprise customers)
- Implement tools and automation to improve deployment, rollback, and infrastructure reliability
Requirements:
- 5+ years of experience in DevOps, SRE, or Infrastructure Engineering roles
- Deep experience with Terraform and at least one major cloud provider (AWS strongly preferred)
- Strong Kubernetes skills: deploying, debugging, and scaling real workloads
- Proficient in scripting or programming (Python, Typescript, or Go)
- Experience supporting production systems and responding to incidents
- Comfortable working directly with customers in a support or deployment context
- Bonus: experience with multi-cloud environments or self-hosted enterprise software