Cohere is a company dedicated to scaling intelligence to serve humanity by training and deploying frontier AI models. The Audio Inference Engineer will focus on optimizing audio inference serving efficiency and enhancing core metrics through collaboration with various teams.
Responsibilities:
- Build reliable machine learning systems and optimize audio inference serving efficiency using innovative techniques
- Advance core audio model serving metrics, including latency, throughput, and quality
- Identify bottlenecks and deliver creative solutions for audio processing and streaming workloads
- Collaborate closely with both the training and serving infrastructure teams to ensure seamless integration between model development and deployment
Requirements:
- Significant experience developing high-performance audio or machine learning inference systems
- Proficiency with programming languages such as C++ and Python
- Hands-on experience with deep learning models for audio, speech, or language applications
- A bias for action and a strong results-oriented mindset
- GPU programming, low-level system optimization, model parallelization techniques over multiple GPUs
- Experience with duplex real-time streaming architectures
- Internals of machine learning frameworks for audio (such as PyTorch, TensorFlow, or specialized audio libraries)
- Experience with inference framework like vLLM, SGLang, Tensort-LLM, or custom distributed inference systems
- Sequence modeling (e.g., transformers for audio/speech) and end-to-end audio pipeline optimization