Afresh is an AI platform for grocery that focuses on reducing food waste and improving efficiency in the grocery industry. The Staff ML Platform Engineer will enhance the core ML platform, enabling faster innovation and impactful solutions for the company's machine learning and applied science teams.
Responsibilities:
- In your first 3 months, you'll partner with ML, Applied Science, and engineering leadership to identify the highest-leverage gaps in our ML platform and shape a multi-quarter technical strategy to close them
- By the end of your first 6 months, you will have driven a platform-level initiative that meaningfully changes what's possible at Afresh — establishing the architecture for real-time inference across the company, redesigning model configuration and deployment end-to-end, or rebuilding our distributed inference layer for an order-of-magnitude growth in scale — while raising the bar across the ML org through design reviews and mentorship of other senior engineers
Requirements:
- BS in Computer Science or a relevant technical field
- 7+ years of professional software development experience with a proven track record of shipping high-quality applications and services
- Experience working collaboratively with machine learning engineers, data scientists, or applied scientists on large-scale software projects involving machine learning models
- You possess a genuine curiosity about ML modeling (e.g., demand forecasting, state estimation, ordering policy). You aren't just building 'pipes'; you want to understand what is flowing through them
- You have a deep understanding of how scientists work and build tools that bridge the gap between a research notebook and production-grade software
- Technical leadership experience and a demonstrated ability to mentor junior engineers
- Deep expertise in library design, API design, data structures, and algorithms
- Strong familiarity with the Python ecosystem (NumPy, Pandas, Torch, PySpark). While our stack is Python-heavy, we value engineers who are stack-agnostic and focus on solving fundamental distributed systems problems
- Proven ability to architect high-throughput distributed inference systems using Spark, Dask, or Ray
- Experience in engineering robust data architectures with a focus on schema evolution and performance tuning
- Experience with end-to-end orchestration (lineage tracking, resource scheduling, self-healing pipelines)