Manage a machine learning operations function responsible for ensuring rapid, low-risk launches, reliable real-time inference and training data pipelines, and strong monitoring/incident response for data science models and decisioning services
Work closely with Data Science & Decision Science teams in enabling enterprise capabilities that create personalization/next best action at scale for Workplace plan participants and Personal Wealth clients
Manage an agile, high-output team responsible for production deployment and operations of ML models and decisioning services
Establish the decisioning “traditionalization pathway” aligned to center of excellence (COE) standards: from trained model artifacts to governed, monitored production releases
Own execution across multiple data science and decision initiatives, ensuring delivery predictability, quality gates, and clear operational readiness
Implement/oversee the implementation of reproducible training/scoring workflows using enterprise data foundations, in partnership with Data Science & Decision Science teams
Build/operate feature pipelines and serving integrations on top of COE-owned data and cloud data platforms (e.g., Snowflake, Redshift) encompassing curated tables, governed access, and shared compute patterns
Ensure consistent documentation, data lineage, auditability, and compliance with enterprise data governance
Operationalize CI/CD for model releases using COE-approved toolchains and patterns (testing, packaging, artifact promotion, environment parity)
Execute controlled pilots and rollouts in partnership with Decision Science for decisioning services, ensuring: a) clear launch criteria and success metrics defined with Data Science & Decision Science teams, b) automated smoke tests and validation, and c) rapid rollback mechanisms and runbooks
Partner with the COE to standardize and improve these patterns for enterprise reuse
Own decisioning inference integration patterns: real-time endpoints, routing, caching (when appropriate), and SLA-driven performance tuning
Collaborate with Decision Science and Technology to enable policy/ranking updates and experimentation hooks (traffic splits, exposure logging, assignment consistency)
Ensure the decisioning inference layer is production-grade in partnership with the COE: authenticated, secure, scalable, and observable
Craft, implement and maintain monitoring for decisioning services and data science models including operational, data and model health indicators
Ensure dashboards and alerts are actionable and visible to partners, and that issue diagnosis paths are documented
Partner closely with Data Science & Decision Science leadership to design, build, and continuously improve a shared internal suite of ML and decisioning tools leveraged across both teams
Drive alignment on standards, reusable components, and operating patterns that increase velocity while reducing friction from research to production
Partner with COE on MLOps combined SLOs/SLAs, disaster recovery plans and resource coverage models to ensure continuous reliability of deployed solutions
Act as the primary interface from Data Science & Decisioning to the ML Eng & Ops COE for platform needs and roadmap input
Expert SQL and Python skills
Requirements
Bachelor’s degree in Computer Science, Engineering, Business, or related field required; MBA or advanced degree preferred
7+ years in software engineering, platform engineering, data/ML engineering, or MLOps (or equivalent experience)
3+ years experience leading teams
Demonstrated experience deploying and operating production ML systems, especially real-time services
Strong foundation in modern engineering practices including: CI/CD pipelines, automated testing, release governance, containers and orchestration (Docker/Kubernetes or enterprise equivalent), API/service design (REST/gRPC), performance testing, tuning and observability (metrics/logs/tracing, alerting)
Experience operating services with on-call, incident response, postmortems, and SLO management
Proven ability to succeed in a matrixed/org model—driving outcomes through partnership and influence
Demonstrated ability to navigate complex organizations to achieve priorities
Working curiosity of the emerging capabilities of artificial intelligence (AI) and how these will impact functional work now and in the future