Samsara is a pioneer of the Connected Operations™ Cloud, helping organizations improve safety, efficiency, and sustainability through IoT data. They are seeking a Senior Machine Learning Engineer to lead the architectural evolution of their safety systems, focusing on building a robust infrastructure for real-time decision-making from sensor data.
Responsibilities:
- Architect a Unified Perception Layer: Lead the transition from fragmented, task-specific models to a modular perception platform that supports reusable components and downstream safety applications
- System Design: Design and implement real-time ML systems—from sensor ingestion and tracking to risk reasoning and actuation—ensuring clear interfaces and predictable system behavior
- Hybrid Deployment: Orchestrate model integration across edge and cloud environments, managing versioning, rollouts, and mission-critical fallback mechanisms
- Latency Ownership: Own end-to-end latency and reliability for safety-critical pipelines. You will profile, schedule, and optimize messaging and backpressure across the entire stack
- Observability & Feedback Loops: Build sophisticated monitoring for deployed models to detect drift, false positives/negatives, and latency regressions. You will "close the loop" to ensure production data informs the next iteration of training
- Safety Cases: Develop evaluation frameworks specifically for rare "long-tail" safety events. You will define metrics and build targeted test sets that form the basis for principled ship/no-ship decisions
- Explainability: Partner with Applied Scientists to ensure research outputs are translated into production code that is not only performant but also debuggable and explainable
- Strategic Influence: Shape the system abstractions early in the platform transition to minimize technical debt and maximize future scalability
- Mentorship: Set the engineering standard for correctness and performance. You will mentor junior and mid-level engineers, fostering a culture of rigorous ML engineering
Requirements:
- 6+ years of experience in ML Engineering, with a proven track record of shipping models in production (ideally in safety-critical domains like robotics, automotive, or industrial AI)
- Deep understanding of distributed systems, performance profiling, and computer vision
- Experience with Cloud ML workflows (AWS/GCP/Azure) and containerization, paired with an understanding of the constraints of edge hardware
- You don't just write code; you design systems. You understand the trade-offs between model complexity and operational reliability
- Ph.D. in Computer Science or quantitative discipline (e.g., Applied Math, Physics, Statistics)
- Experience with containerization technologies (e.g., Docker, Kubernetes), continuous integration/continuous deployment (CI/CD) pipelines, and infrastructure-as-code (IaC) frameworks
- Familiar with deploying and managing ML applications in cloud environments, as well as leveraging cloud-based services for data storage, processing, and inference
- Experience building end-to-end ML applications from scratch