Baseten is a company that powers mission-critical inference for leading AI companies. They are seeking a Software Engineer focused on Voice AI to lead the development and implementation of their in-house inference stack, collaborating with various engineering teams to enhance Voice AI capabilities.
Responsibilities:
- Own and lead Voice AI product areas end-to-end — from architecture and system design through implementation, rollout, and long-term production operations
- Design, build, and operate real-time, large-scale, high-performance model serving systems for STT, TTS, and voice agent workloads with clear SLOs for mission-critical customer deployments
- Drive cross-team collaboration with sister engineering teams to solve full-stack technical problems, aligning on priorities, and coordinating end-to-end delivery across the product surface area
- Mentor teammates through code reviews, design docs, and technical leadership
Requirements:
- Bachelor's degree or higher in Computer Science or related field
- Proven track record owning production-grade real-time, large-scale systems where tail latency (p99) matters
- Proficient coding abilities in one or more popular programming or scripting languages; Python proficiency is a plus
- Good taste in product, particularly developer-oriented tools
- Interest in ML/AI infrastructure and willingness to learn
- Strong collaboration and communication skills
- Comfortable using AI coding assistants (e.g., Claude Code, Codex, Cursor) as a daily productivity multiplier — as an AI-native company, we see this as a must-have skill
- Experience implementing pipeline-level model runtime optimizations such as dynamic batching, async scheduling, or decode-side throughput improvements
- Experience building developer platforms: SDKs, CLIs, APIs, and self-serve workflows for ML or infrastructure products
- Experience with containerization and orchestration technologies (Docker, Kubernetes), service meshes, or distributed scheduling
- Familiarity with speech/audio ML models (STT, TTS, speech-to-speech)
- Familiarity with model-serving runtimes (vLLM, TensorRT, ONNX)
- Familiarity with systems-level performance profiling across host-device boundaries (e.g. PyTorch Profiler), diagnosing GPU utilization issues
- Exposure to customer-facing engineering: pre-sales prototyping, technical discovery, or working directly with customers to ship solutions