Monogram Health is a leading multispecialty provider of in-home, evidence-based care for complex patients with multiple chronic conditions. They are seeking a Staff Engineer in Machine Learning Operations to architect and scale machine learning infrastructure while mentoring teams and driving strategic decisions that impact patient outcomes.

Responsibilities:

Architect and maintain enterprise-grade ML infrastructure, including model versioning, automated testing frameworks, containerization strategies, CI/CD pipelines, and comprehensive monitoring systems for model performance, data quality, and drift detection
Drive MLOps strategy and standards across the organization. Mentor data scientists and engineers on production best practices, system design, and scalable architecture patterns
Own the complete journey from model development through production deployment, including real-time and batch inference systems, A/B testing frameworks, and automated retraining pipelines
Collaborate with clinical leaders, product teams, and data scientists to translate complex healthcare requirements into robust, scalable ML solutions. Present technical strategies to executive stakeholders
Build fault-tolerant, compliant systems that meet healthcare security and privacy standards. Establish SLAs, incident response protocols, and disaster recovery procedures for mission-critical ML services
Evaluate and integrate cutting-edge MLOps tools and practices. Design systems that scale with Monogram's growth while reducing operational overhead and improving model iteration velocity

Requirements:

Bachelor's degree in computer science, engineering, or related field required; master's degree preferred
Minimum of ten (10) years in software engineering with five (5) years focused on ML infrastructure, MLOps, or production ML systems and Python development with strong software engineering fundamentals and three (3) years architecting and deploying production ML systems on cloud platforms (Azure preferred)
Proven track record building and scaling ML platforms from the ground up
Expert-level proficiency with MLOps tooling (MLflow, Kubeflow, SageMaker, Azure ML, etc.)
Deep experience with containerization (Docker, Kubernetes), orchestration tools (Airflow, Prefect), and infrastructure-as-code (Terraform, ARM templates)
Advanced knowledge of CI/CD systems, automated testing strategies, and GitOps workflows
Data engineering skills: SQL, Spark/PySpark, Databricks, data pipeline optimization
Expertise in model monitoring, observability, feature stores, and experiment tracking at scale
Production experience with both batch and real-time inference architectures
Demonstrated ability to influence technical direction and mentor senior engineers
Proven communication skills with ability to distill complex technical concepts for diverse audiences
Track record of driving consensus on architectural decisions across multiple stakeholders
Systems thinking skills with focus on reliability, scalability, and maintainability
Healthcare or regulated industry experience strongly preferred
Understanding of healthcare data standards (FHIR, HL7, claims data) is a plus
Understanding of security, compliance, and privacy requirements in healthcare (HIPAA) preferred
Bias toward action with pragmatic approach to technical debt and iterative improvement preferred

Staff Machine Learning Engineer

Key skills

About this role

Responsibilities:

Requirements: