The University of Texas MD Anderson Cancer Center is seeking a Senior Machine Learning Operations Engineer to support enterprise-wide artificial intelligence initiatives within Data Impact & Governance. This role involves building, deploying, and sustaining production-quality machine learning systems while collaborating with various stakeholders to ensure AI solutions are scalable and responsible.
Responsibilities:
- Oversee end-to-end AI model lifecycles including training, evaluation, deployment, monitoring, and maintenance of production-quality machine learning models
- Design and implement CI/CD pipelines for model training, deployment, monitoring, and retraining with a focus on security, scalability, reliability, reproducibility, and performance
- Implement rigorous testing, versioning, and documentation practices to support reproducibility, risk mitigation, and measurable impact
- Maintain comprehensive experiment tracking, data lineage, model lineage, and model scorecards
- Design fallback, rollback, and decommissioning strategies to ensure operational continuity of AI solutions
- Promote responsible AI practices by minimizing bias, enhancing fairness, and maximizing transparency in machine learning models
- Ensure AI lifecycle management aligns with institutional standards and best practices
- Support assessment, validation, and onboarding of external machine learning models and AI-driven products to minimize organizational risk and maximize value
- Develop and maintain scalable data pipelines, feature stores, and artifact management systems
- Deploy and operate ML workloads across cloud and on-premises environments including Azure, AWS, or GCP
- Utilize containerization and orchestration technologies such as Docker, Kubernetes, and DAG-based tools
- Apply DevOps and MLOps tools including Azure DevOps, GitHub Actions, and version control systems
- Collaborate with stakeholders to gather requirements, translate AI concepts into understandable terms, and incorporate feedback
- Partner with data scientists, ML engineers, and software engineers to integrate models into enterprise systems
- Deliver training and knowledge sharing to enhance AI understanding and adoption across the organization
- Report project progress, impact, risks, and recommendations to leadership
- Stay current with emerging technology trends in AI, MLOps, and healthcare analytics
- Contribute to internal and external technical communities
- Foster a culture of continuous improvement, innovation, and learning across teams
- Perform other duties as assigned
Requirements:
- Bachelor's degree in Computer Science, Software Engineering, Data Science, Physics, Math & Statistics, or another related engineering discipline
- Five years of experience in machine learning engineering, data science, data engineering, and/or software engineering
- With Master's degree, three years' experience required
- With PhD, one year of experience required
- Master's Level Degree
- Experience developing MLOps pipelines for computer vision AI models
- Hands on experience developing custom machine learning algorithms from scratch (e.g., in NumPy or PyTorch)
- Designed and implemented shared machine learning service that is used across multiple teams or production projects
- Led the development of systems that automate the deployment and maintenance of multiple machine learning models into user-facing products
- Five years of industry experience in data science, with at least 3 of those years as a Senior Machine Learning Engineer