Samsara is the pioneer of the Connected Operations™ Cloud, enabling organizations to harness IoT data for actionable insights. The Staff ML Engineer will lead the design and evolution of the ML platform, focusing on improving safety outcomes through scalable ML systems.
Responsibilities:
- Design, build, and operate Samsara’s end-to-end ML platform (training, experimentation, batch/online inference, edge) used by multiple Safety AI product teams
- Evolve shared training and experimentation infrastructure (orchestration, clusters, environments) and standardize tracking, evaluation, and regression testing for fast, safe iteration
- Partner with product and applied ML teams to ship ML-powered features (CV models, EcoDriving insights, LLM-based reporting) that improve safety, reliability, and cost efficiency
- Lead throughput and cost modeling for new ML features—from exploration to production-scale capacity planning—to inform roadmap and go/no-go decisions
- Drive experiment design and evaluation, defining success metrics, structuring A/B or offline tests, and turning results into product and technical decisions
- Design and operate scalable online and batch inference systems (Ray, Spark), including deployment patterns, observability, SLOs, and unified training-to-production workflows
- Partner with firmware and edge teams to package, validate, and deploy models to Samsara devices, and build feedback loops from edge to cloud for continuous improvement
- Own reliability, observability, and security for ML systems across cloud and edge, including on-call practices, incident response, and infrastructure hardening
- Own or co-own end-to-end technical delivery for high-priority or high-risk initiatives, from modeling and system design through production rollout
- Provide Staff+/Senior-Staff technical leadership on ML infrastructure architecture and strategy, influencing cross-team decisions and mentoring engineers and applied scientists
- Drive strong developer experience through documentation, office hours, and best practices, while contributing to and representing Samsara in open source communities (Ray, Spark, RayDP)
- Champion and role model Samsara’s cultural principles: Focus on Customer Success, Build for the Long Term, Adopt a Growth Mindset, Be Inclusive, Win as a Team
Requirements:
- 10+ years of overall experience in machine learning engineering or related fields, with a strong track record of building and operating large-scale ML systems
- Strong experience with distributed computing frameworks such as Ray and/or Spark
- Hands-on experience with cloud infrastructure (AWS), containers/Kubernetes, and production observability tooling
- Proven experience building or supporting ML platforms (training, experimentation, or inference) used by multiple teams
- Solid understanding of ML fundamentals including evaluation, experiment design, and model iteration in production environments
- Experience shipping ML-powered features end-to-end, from design through production and iteration, with measurable impact on product or business metrics
- Background in computer vision and/or LLM-based systems in production environments
- Experience with edge or on-device ML and collaboration with firmware or embedded teams
- Familiarity with model lifecycle systems (model registry, deployment, monitoring, rollback, drift detection)
- Experience working in environments with strong security and compliance requirements
- Demonstrated ability to lead across teams and influence technical direction at Staff+ scope
- A strong sense of ownership and a desire for end-to-end autonomy—from platform design to real-world impact