Together AI is a research-driven artificial intelligence company focused on building inference infrastructure for voice applications. They are seeking a Senior ML Engineer to optimize model serving for voice workloads, ensuring high performance and reliability in real-time voice applications.
Responsibilities:
- Optimize inference performance for voice models (STT, TTS, speech-to-speech) — targeting best-in-class TTFB, throughput, and GPU utilization across our curated model set
- Productionize voice models on serverless and dedicated endpoints, including batching strategies, streaming inference, and memory management tailored to audio workloads
- Build and maintain a voice model evaluation framework — measuring WER across accents, languages, and noise conditions for STT; naturalness, latency, and pronunciation accuracy for TTS
- Enable new model architectures in our serving stack as the field evolves, including audio-native LLMs, codec-based models (SNAC), and speech-to-speech systems
- Collaborate with model partners to integrate and optimize their models (Cartesia, Deepgram, Rime, and others) running on Together's infrastructure
- Profile and debug performance across the full inference stack — from GPU kernels to framework-level bottlenecks — and ship measurable improvements
- Work with the platform engineering side of the team to ensure the serving layer meets the latency and reliability requirements of real-time voice APIs
- Contribute to voice model fine-tuning capabilities (STT and TTS) as we enable customers to build differentiated voice experiences on Together
- Lay the groundwork for multiple new products down the line
Requirements:
- 5+ years of experience in ML engineering, with a focus on model serving, inference optimization, or ML infrastructure
- Hands-on experience with LLM serving engines (vLLM, SGLang, TensorRT-LLM, or similar) — comfortable reading and modifying engine internals, not just using APIs
- Strong proficiency in Python and PyTorch; experience with GPU profiling and optimization (CUDA, memory management, kernel-level debugging)
- Track record of shipping ML systems to production with measurable performance improvements
- Strong product sense — you think about what developers building voice apps actually need, not just what's technically interesting
- Comfort working on a small, early-stage team where you'll wear multiple hats and move fast
- Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field, or equivalent practical experience
- Experience with speech and audio ML (ASR, TTS architectures, audio signal processing) is a strong plus but not required — you can learn this quickly if you have strong ML engineering fundamentals
- Familiarity with audio codecs and tokenization schemes (SNAC, Encodec, DAC) is a plus
- Experience training or fine-tuning speech models is a plus