MD Anderson Cancer Center is a world-renowned cancer center focused on clinical AI and medical imaging. The Senior Machine Learning Engineer will be responsible for the full lifecycle of clinical computer vision models, including problem definition, model development, deployment, and performance monitoring in real-world workflows.
Responsibilities:
- Own the full lifecycle of medical imaging ML models—from problem definition and model development to deployment, monitoring, maintenance, and retirement
- Participate as a technical owner in formal governance, release, and incident review processes, with clear escalation paths and responsibilities
- Translate clinical imaging use cases into deployable AI solutions with defined evaluation metrics, operating thresholds, and reproducible implementation strategies
- Design and execute post-deployment monitoring, including detection and mitigation of model degradation due to distribution shift, scanner changes, or labeling variability
- Collaborate with ML platform, data science, IT, and clinical operations teams to deploy and operate models in secure enterprise environments
- Maintain responsible AI practices, ensuring traceability of data, models, experiments, and documentation of limitations and failure modes
- Contribute to fallback, rollback, and model decommissioning strategies to support patient safety and operational continuity
- Engage clinical, technical, and operational partners to support safe adoption and communicate model risks, behaviors, and performance
- Mentor junior team members and contribute to best practices, review standards, and reproducible ML workflows
Requirements:
- Bachelor's degree in Computer Science, Software Engineering, Data Science, Physics, Math & Statistics, or another related engineering discipline
- Five years of experience in machine learning engineering, data science, data engineering, and/or software engineering
- Experience developing, deploying, and operating medical imaging ML models in regulated clinical environments
- Ability to build imaging data pipelines involving DICOM workflows, dataset versioning, and distributed training
- Deep proficiency in Python and PyTorch for model training and inference under GPU and memory constraints
- Experience orchestrating ML workflows using Airflow, Prefect, or similar DAG-based systems
- Skilled in deploying containerized ML workloads on enterprise cloud platforms such as Azure using Kubernetes
- Understanding of audit-ready model tracking, lineage, and controlled promotion workflows
- Ability to scope medical imaging ML projects end to end, considering clinical and regulatory constraints
- Experience designing validation strategies aligned with governance, regulatory expectations, and change control processes
- Knowledge of healthcare data privacy requirements as they relate to medical imaging and clinical metadata
- Ability to evaluate model performance quantitatively in the context of clinical workflows and operational realities
- Experience engaging clinicians, patient safety, and business stakeholders to communicate model performance, impacts, and risk considerations
- Ability to assess model generalizability and failure modes across scanners, sites, and populations
- Collaborate effectively with data scientists, ML engineers, software teams, clinicians, and operational leaders to integrate imaging models into real workflows
- Produce clear, comprehensive technical documentation including design specs, validation reports, and runbooks
- Communicate project risks, timelines, and outcomes to leadership and governance bodies
- Contribute to internal technical standards, best practices, and shared ML development frameworks
- Present technical and non‑technical updates clearly across multiple stakeholder groups
- Master's Degree or PHD with a concentration in Science, Engineering, or related field
- With Master's degree, three years' experience required
- With PhD, one year of experience required
- Experience operating medical imaging ML systems across multiple sites, scanners, or protocols, rather than a single controlled environment
- Experience handling post-deployment failures, including performance degradation, clinical incidents, model updates, or corrective actions
- Experience raising the technical bar for team members, such as establishing reproducibility practices, review standards, or shared patterns
- Experience technically evaluating third-party medical imaging AI within clinical workflows