Together AI is a research-driven artificial intelligence company focused on building advanced voice applications. They are seeking a Senior Platform Engineer to own the API and infrastructure layer for voice workloads, ensuring the reliability and performance of their Voice AI platform.
Responsibilities:
- Own the real-time API layer (WebSocket + HTTP streaming) that powers Together's voice platform
- Design autoscaling and orchestration for voice workloads running on tens of thousands of GPUs
- Build the developer experience — APIs, observability, and tooling — for a fast-growing product area
- Work with production voice customers (contact centers, AI agents, communication platforms) to ship what they actually need
- Build and harden real-time WebSocket and HTTP streaming APIs for STT and TTS — including connection lifecycle management, backpressure, error handling, and reconnection, at the reliability bar needed for production voice agents
- Design and ship autoscaling for voice model endpoints that handles bursty, real-time traffic patterns — accounting for concurrent connection limits, streaming state, and hard latency ceilings
- Implement voice-specific API features: word-level alignment, speaker diarization in realtime, audio format flexibility (g711/mulaw for telephony, PCM, WebRTC formats), pronunciation controls, and multi-context WebSocket support
- Build voice-specific observability — latency breakdowns, audio quality signals, and dashboards that help both the team and customers debug issues
- Own multi-model normalization across our model partners (Cartesia, Deepgram, Rime, and others), ensuring consistent API behavior regardless of the underlying provider
- Collaborate with the ML engineering side of the team on the interface between the API layer and the model serving stack, ensuring latency and reliability requirements are met end-to-end
- Contribute to developer experience — API design, documentation, integration cookbooks, playground and showcasing how best-in-class voice agents are built
- Lay the groundwork for multiple new products down the line
Requirements:
- 5+ years of experience building large-scale, real-time distributed systems and API services
- Deep expertise in real-time streaming infrastructure — WebSocket server architecture, Server-Sent Events, bidirectional streaming, connection multiplexing, and stateful protocol design
- Expert-level programming in TypeScript and Python; experience with Rust is a plus
- Strong distributed systems fundamentals: load balancing, autoscaling, rate limiting, and traffic shaping for latency-sensitive workloads
- Experience with Kubernetes — including custom autoscalers, resource management, and health checking for stateful services
- Strong product sense — you care about API ergonomics and think about what developers building voice apps actually need
- Comfort working on a small, early-stage team where you'll wear multiple hats and move fast
- Experience with audio or media protocols (WebRTC, g711, PCM encoding) is a strong plus
- Familiarity with ML model serving infrastructure and how inference engines work is a plus — you'll interface with the serving layer regularly
- Bachelor's or Master's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience
- Experience with Rust is a plus
- Full-stack experience (React, Next.js) is a nice-to-have for contributing to developer-facing tooling