Baseten powers mission-critical inference for the world's most dynamic AI companies. As a Senior Infrastructure Software Engineer, you'll architect and lead the development of our ML inference platform, making key technical decisions to enable developers to deploy and scale ML models.
Responsibilities:
- Design and architect scalable infrastructure systems for our ML inference platform
- Lead optimization of Kubernetes deployments for efficient, cost-effective model serving
- Drive enhancements to our inference orchestration layer for complex model deployments
- Define monitoring strategies for model performance, latency, and resource utilization
- Develop advanced solutions for GPU capacity management and throughput optimization
- Establish infrastructure automation standards to streamline ML deployment workflows
- Partner with other engineers to translate complex inference requirements into technical solutions
- Make critical architectural decisions balancing performance with system reliability
- Lead technical discussions and mentor junior engineers on infrastructure best practices
- Contribute to long-term technical strategy and infrastructure roadmap
Requirements:
- Bachelor's degree or higher in Computer Science or related field
- 5+ years experience building production infrastructure systems
- Expert-level proficiency in Go, with Python experience a plus
- Deep expertise with Kubernetes in production environments
- Extensive experience with major cloud providers (AWS, GCP) and neo-cloud providers (Crusoe, DigitalOcean, Nebius) a plus
- Advanced understanding of distributed systems concepts and performance tuning
- Proven experience designing observability systems
- Track record of leading technical initiatives and mentoring engineers
- Experience with ML/AI workloads and MLOps platforms highly valued