Wizard AI is developing a leading AI Shopping Agent that provides top products with exceptional accuracy and quality. They are seeking a Senior Machine Learning Engineer to oversee the production lifecycle of ML serving systems, ensuring reliability, efficiency, and scalability in a dynamic production environment.

Responsibilities:

Own and evolve our multi-engine inference platform, supporting a variety of model types and serving requirements
Build and improve production ML pipelines — taking models from experimentation to reliable, high-throughput serving
Define and implement model versioning, rollout, rollback, and lifecycle management strategies that ensure reproducibility and operational reliability
Define and enforce serving-layer SLAs, including latency, availability, GPU utilization, Time-to-First-Token (TTFT), and Inter-Token Latency (ITL)
Build observability, monitoring, alerting, and operational tooling for production inference systems
Apply software engineering best practices, including testing, CI/CD integration, and reproducibility across ML workflows
Optimize inference performance through efficient resource utilization, hardware-aware serving strategies, and cost-conscious infrastructure design
Ensure ML serving systems are secure, scalable, and operationally resilient
Partner with ML, Data, Product, and DevOps teams to turn ideas into production systems, driving the technical decisions on serving and scale

Requirements:

Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field, or equivalent practical experience
5–8+ years of experience in Software Engineering, ML Engineering, Platform Engineering, or Infrastructure Engineering, with direct ownership of production ML serving systems
Hands-on experience running an LLM serving engine (vLLM, TGI, TensorRT-LLM, or SGLang) in production under real load — not just managed or hosted endpoints
Strong Python skills and software engineering fundamentals, combined with deep systems and infrastructure knowledge
Experience with cloud platforms such as AWS, GCP, or Azure, and familiarity with ML lifecycle tooling, experimentation platforms, and model registries
Strong grasp of inference performance — continuous batching, KV-cache and GPU-memory behavior, quantization, and CPU-versus-GPU bottlenecks — with the instinct to profile before tuning
Experience serving heterogeneous workloads, including LLMs, embedding models, and extraction models, each with distinct latency, throughput, and scaling requirements
Demonstrated ability to balance latency, throughput, reliability, and infrastructure cost while operating production-scale ML systems
Experience in high-growth startup environments and comfort operating in fast-moving, evolving technical landscapes

Senior Machine Learning Engineer (Inference Platform)

Key skills

About this role

Responsibilities:

Requirements: