Instacart is transforming the grocery industry by building AI-powered checkout experiences for in-store shopping. As a Senior Staff Engineer in Computer Vision/AI, you will develop novel models and algorithms, optimize data infrastructure, and collaborate with multiple teams to enhance basket accuracy and reduce shrinkage for retail partners.
Responsibilities:
- Invent and build novel CV models and algorithms from scratch — detection, classification, tracking, and multi-sensor fusion — to replace heuristic-based basket understanding with robust, ML-driven accuracy
- Own end-to-end data infrastructure: design and operate labeling pipelines, curation strategies, active learning loops, and annotation quality controls that continuously improve model performance. Devise mechanisms to label data in a way that is globally usable across all Caper domains
- Own the model-to-device pipeline: optimize models for Caper’s edge hardware using quantization, pruning, TensorRT/ONNX, and CUDA-level tuning to meet real-time latency and power budgets
- Partner with AI Infrastructure and Instacart’s ML platform teams to define where and how models run — evaluating tradeoffs across on-device, cloud, and third-party vendor options (e.g., NVIDIA)
- Translate business goals — basket accuracy, shrink reduction, retailer tooling — into system requirements; define metrics and run rigorous offline and online experiments to validate and communicate impact
- Define technical strategy and roadmap for CV/AI; serve as the primary technical voice in cross-functional decisions with Device, Backend, AI Infrastructure, and Product
- Establish engineering standards for model development and deployment that scale org-wide; grow engineers through mentorship and design leadership
- Ensure in-store image and video systems are designed with privacy, security, and compliance requirements built in from the start
Requirements:
- 8+ years owning end-to-end computer vision or deep learning systems in production — with a demonstrated track record of building models and algorithms from scratch, not just adapting or fine-tuning existing ones
- Bachelor's in Computer Science, Electrical Engineering, or related field, or equivalent experience
- Demonstrated ability to design and ship novel CV architectures using Python and PyTorch (or TensorFlow), with C++ ownership of inference and performance-critical components
- Proven ownership of the model-to-device pipeline: led the deployment of production models on edge hardware under strict latency and power constraints using TensorRT, ONNX, CUDA, quantization, and pruning
- Led the design and operation of large-scale data and labeling pipelines — including taxonomy decisions, active learning strategy, and quality standards — as foundational 0-to-1 infrastructure, not an afterthought
- Track record of framing ambiguous business problems as tractable engineering workstreams, driving cross-functional alignment, and communicating measurable impact through rigorous offline and online experiments
- Owned distributed training and inference infrastructure on cloud/GPU platforms (AWS/GCP/Azure, Docker/Kubernetes) and driven architectural decisions including vendor and platform strategy (e.g., NVIDIA, third-party) that improved reliability or velocity org-wide
- Established engineering standards, grown engineers through mentorship and design leadership, and shaped technical direction across teams — not just within a single squad
- Graduate degree (MS or PhD) in Computer Vision, Machine Learning, Robotics, or related field
- Experience with multi-modal perception and sensor fusion (RGB, depth, weight sensors) for product identification and tracking
- Background in retail, point-of-sale, or fraud/shrink detection systems
- Strong MLOps experience (MLFlow, Airflow, monitoring/alerting, data/version control)
- Low-level performance expertise: custom CUDA kernels, graph optimization, NVIDIA GPU tooling
- Demonstrated 0-to-1 track record: scoping ambiguous problems, driving cross-functional alignment, shipping novel systems to production
- Big plus: you have owned model evaluation platforms and pipelines and know exactly how to build such infrastructure (across cloud and edge) to help understand how the models are performing in production and techniques that prevent drift and improve performance