Perplexity is seeking an Inference Engineering Manager to lead their AI Inference team, responsible for building and scaling the infrastructure for their products and APIs. The role involves owning the technical direction of inference systems while leading a team of engineers and collaborating with ML research teams.
Responsibilities:
- Lead and grow a high-performing team of AI inference engineers
- Develop APIs for AI inference used by both internal and external customers
- Architect and scale our inference infrastructure for reliability and efficiency
- Benchmark and eliminate bottlenecks throughout our inference stack
- Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models
- Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc
- Improve the reliability and observability of our systems and lead incident response
- Own technical decisions around batching, throughput, latency, and GPU utilization
- Partner with ML research teams on model optimization and deployment
- Recruit, mentor, and develop engineering talent
- Establish team processes, engineering standards, and operational excellence