R1 RCM is building healthcare’s first Revenue Operating System, leveraging AI to enhance hospital billing and reimbursement processes. The role involves owning the production runtime for Phare’s ML stack, deploying and scaling models while ensuring system reliability and observability.

Responsibilities:

You’ll own the production runtime for Phare’s ML stack - deploying, serving, and scaling models across inference endpoints and batch/streaming workflows
You’ll build progressive delivery pipelines with automated rollouts and rollbacks, manage SLOs for latency and availability, and instrument end-to-end observability (metrics, logs, traces, drift, regression)
You’ll harden the platform with Terraform, Kubernetes, and CI/CD, ensuring reproducible, auditable ML releases

Requirements:

At least 5 years of relevant industry experience in software engineering
At least 2 years of direct MLOps experience
Experience in deploying and operating models running on GPUs in production - APIs and batch/streaming inference
Strong with Docker/Kubernetes, IaC (e.g., Terraform), and CI/CD for services and model artifacts
Experience maintaining environment parity, reproducible releases, and robust model/experiment versioning with data lineage
Experience using progressive delivery with automated rollouts/rollbacks
Experience building end-to-end observability (metrics, logs, traces, and model telemetry for drift/regression)
Experience with actionable alerting, runbooks, and incident response
Experience managing model registries and stage gates
Experience designing scheduled or event-driven retraining when appropriate
Experience enforcing RBAC, secrets management, encryption, and audit logs
Experience in regulated environments (e.g., healthcare, finance)

Product Engineer

Key skills

About this role

Responsibilities:

Requirements: