Wizard AI is building a high-performing AI Shopping Agent, and they are seeking a Senior MLOps Engineer to help run their machine learning systems reliably in production. The role involves owning the end-to-end lifecycle of ML systems, improving production ML pipelines, and collaborating with various teams to enhance system performance and scalability.
Responsibilities:
- Build and improve production ML pipelines, making it easy to move models from experimentation to reliable production use
- Help own and evolve our multi-engine inference platform (LLMs, embeddings, and extraction), improving how different workloads are served and scaled
- Put strong foundations in place for model versioning, rollouts, and rollbacks so systems stay reproducible and safe to iterate on
- Define and monitor key system metrics like latency, availability, and GPU utilization, and set clear expectations around performance
- Improve overall system performance — whether that’s reducing latency, increasing throughput, or making better use of GPU resources
- Design systems that are resilient and cost-aware, with thoughtful approaches to autoscaling, failure isolation, and graceful degradation
- Bring solid engineering practices (testing, CI/CD, observability) into ML workflows to help the team move faster without sacrificing reliability
- Partner closely with ML, Data, Product, and DevOps to turn ideas into production-ready systems and help guide technical decisions
Requirements:
- 5–8+ years of experience in software, ML, platform, or infrastructure engineering, with hands-on ownership of production ML systems
- Experience deploying and running LLMs or other deep learning models in real-world environments
- Strong Python skills and a solid foundation in software engineering
- Familiarity with cloud platforms (AWS, GCP, Azure) and common ML tooling (model registries, experiment tracking, etc.)
- A good understanding of inference performance — batching, memory usage, quantization, and how systems behave across CPU and GPU
- Experience working with (or curiosity about) systems that serve different types of models with different constraints
- Ability to think through tradeoffs between speed, cost, and reliability in a practical way
- Comfort working in a fast-moving environment where things evolve quickly