Perplexity is seeking an Inference Engineering Manager to lead their AI Inference team, responsible for building and scaling the infrastructure for their products and APIs. The role involves owning the technical direction of inference systems while leading a team of engineers and collaborating with ML research teams.

Responsibilities:

Lead and grow a high-performing team of AI inference engineers
Develop APIs for AI inference used by both internal and external customers
Architect and scale our inference infrastructure for reliability and efficiency
Benchmark and eliminate bottlenecks throughout our inference stack
Drive large sparse/MoE model inference at rack scale, including sharding strategies for massive models
Push the frontier with building inference systems to support sparse attention, disaggregated pre-fill/decoding serving, etc
Improve the reliability and observability of our systems and lead incident response
Own technical decisions around batching, throughput, latency, and GPU utilization
Partner with ML research teams on model optimization and deployment
Recruit, mentor, and develop engineering talent
Establish team processes, engineering standards, and operational excellence

Engineering Manager - Inference

Key skills

About this role

Responsibilities: