Paramount is on a mission to unleash the power of content and is seeking a Senior Machine Learning Operations Engineer to own the operational layer around their personalization and recommendation Machine Learning systems. The role involves ensuring the reliability and performance of ML models, collaborating with DevOps and ML engineers, and building monitoring and diagnostic tooling for production systems.

Responsibilities:

Own model traceability: Every model in production should have clear lineage: what data trained it, what code produced it, what validation it passed, and how it's performing. Evaluate and recommend tooling for versioning, metadata, and model registry, and work with MLEs to drive adoption
Build end-to-end monitoring: Monitor the full signal path: data arrival, feature distribution stability, model metrics, and serving latency against SLA. Own this individually, don't rely solely on upstream teams to catch their own issues
Partner with Data Engineering on data quality: Collaborate to surface data quality issues, detect drift in upstream sources, and ensure features stay fresh and reliable
Detect issues proactively: Track drift over weeks, flag slow degradation before it crosses a threshold, surface feature freshness problems before they cascade
Build diagnostic tooling: When something goes wrong, get from "recommendations look off" to root cause in minutes. That means ensuring the right context is logged at each stage, candidates, features, serving context, and building the dashboards to tie it collectively
Own incident response for ML systems: Maintain rollback playbooks and pre-defined hotfix strategies with quantified tradeoffs. Own automated gates that block bad deployments. Run post-mortems and close the gaps
Coordinate on post-deployment metrics: Work with ML engineers, data engineers, and stakeholders to define what metrics to collect after deployment and why they matter

Requirements:

5+ years in ML engineering, applied ML, or a related ML role, with demonstrated experience on the operational side of monitoring, reliability, deployment, or incident response
Has built or operated model registries, ML monitoring systems, or production ML pipelines
Understands ML systems end-to-end — not just the infra layer, but why a stale feature or a shifted distribution matters
Robust SQL skills and comfort digging into data distributions, feature health, and model behavior
Comfortable partnering with DevOps and Platform teams to define infrastructure needs without needing to own the infra yourself
Experience operating recommendation or personalization systems at scale

Senior Machine Learning Operations Engineer, ModelOps and Runtime Platform Engineering

Key skills

About this role

Responsibilities:

Requirements: