Baseten is a company that powers mission-critical inference for leading AI companies. They are seeking a Software Engineer focused on Voice AI to lead the development and implementation of their in-house inference stack, collaborating with various engineering teams to enhance Voice AI capabilities.

Responsibilities:

Own and lead Voice AI product areas end-to-end — from architecture and system design through implementation, rollout, and long-term production operations
Design, build, and operate real-time, large-scale, high-performance model serving systems for STT, TTS, and voice agent workloads with clear SLOs for mission-critical customer deployments
Drive cross-team collaboration with sister engineering teams to solve full-stack technical problems, aligning on priorities, and coordinating end-to-end delivery across the product surface area
Mentor teammates through code reviews, design docs, and technical leadership

Requirements:

Bachelor's degree or higher in Computer Science or related field
Proven track record owning production-grade real-time, large-scale systems where tail latency (p99) matters
Proficient coding abilities in one or more popular programming or scripting languages; Python proficiency is a plus
Good taste in product, particularly developer-oriented tools
Interest in ML/AI infrastructure and willingness to learn
Strong collaboration and communication skills
Comfortable using AI coding assistants (e.g., Claude Code, Codex, Cursor) as a daily productivity multiplier — as an AI-native company, we see this as a must-have skill
Experience implementing pipeline-level model runtime optimizations such as dynamic batching, async scheduling, or decode-side throughput improvements
Experience building developer platforms: SDKs, CLIs, APIs, and self-serve workflows for ML or infrastructure products
Experience with containerization and orchestration technologies (Docker, Kubernetes), service meshes, or distributed scheduling
Familiarity with speech/audio ML models (STT, TTS, speech-to-speech)
Familiarity with model-serving runtimes (vLLM, TensorRT, ONNX)
Familiarity with systems-level performance profiling across host-device boundaries (e.g. PyTorch Profiler), diagnosing GPU utilization issues
Exposure to customer-facing engineering: pre-sales prototyping, technical discovery, or working directly with customers to ship solutions

Software Engineer - Voice AI (Inference Runtime)

Key skills

About this role

Responsibilities:

Requirements: