MD Anderson Cancer Center is dedicated to eliminating cancer through outstanding programs that integrate patient care, research, prevention, and education. They are seeking a Senior MLOps Engineer to advance MLOps initiatives, orchestrating AI lifecycle management and supporting the development of production-quality machine learning models to enhance cancer care.
Responsibilities:
- Oversee the lifecycle of AI models, encompassing training, evaluation, deployment, monitoring, and maintenance of production quality machine learning models, in compliance with standards and best practices
- Develop CI/CD pipelines for ML model training, deployment, and monitoring while upholding security, scalability, reliability, reproducibility, and performance
- Provide rigorous testing, versioning, and documentation, ensuring impact, risk mitigation, and reproducibility
- Develop and support a culture responsible AI by minimizing bias, enhancing fairness, and maximizing transparency in AI models
- Maintain diligent records of model development experiments, data and model lineage tracking, as well as data and model scorecards
- Engage with stakeholders to gather requirements, convey AI concepts understandably, and capture feedback
- Design fallback and decommissioning strategies for AI solutions to ensure operational continuity
- Support the evaluation and onboarding of third-party machine learning models, ensuring they meet institutional standards, enhance institutional value, and minimize organizational risk
- Deliver training on AI solutions to enhance understanding and application across the organization
- Engage with technology trends, contribute to tech communities, and foster a culture of continuous learning and innovation
Requirements:
- Bachelor's degree in Computer Science, Software Engineering, Data Science, Physics, Math & Statistics, or another related engineering discipline
- Five years of experience in machine learning engineering, data science, data engineering, and/or software engineering
- Proficient in developing, deploying, and maintaining AI/ML algorithms in production environments
- Skilled in constructing scalable data pipelines, feature and artifact management, and analytics
- Experienced with MLOps tools and processes for data, code, and model management
- Strong proficiency in Python and either C++ or C#, with practical knowledge of TensorFlow, PyTorch, and Scikit-learn
- Knowledgeable about AI/ML platform infrastructure, including cloud and on-premises architectures
- Familiar with cloud-native tools, services, and computing environments (eg. Azure, AWS, GCP)
- Proficient in DevOps practices and CI/CD pipelines, including Azure DevOps and GitHub Actions
- Experienced with containerization using Docker and orchestration with Kubernetes, along with DAGs tools
- Skilled in project management methodologies (SAFe agile, PRINCE2, Lean) for end-to-end AI/ML project lifecycle management
- In-depth knowledge of AI/ML Model Lifecycle Management aligned with ISO standards for software and AI development
- Proficient in decision-making, problem-solving, and executing AI/ML healthcare solutions
- Skilled at the quantitatively assessing machine learning models for performance, workflow impact, and potential risks
- Adept at collaborating with vendors and partners for evaluating and integration third-party AI solutions into current systems and processes
- Competent in identifying risks and formulating mitigation plans to prevent project delays
- Collaborate with data scientists, ML engineers, and software engineers to integrate machine learning models into existing systems
- Document CI/CD pipelines, deployment workflows, and infrastructure setups
- Report project metrics, including progress, impact, and risks, to leadership, offering strategic recommendations for AI/ML use-case prioritization
- Manage stakeholder relations to facilitate solution adoption and address issues
- Share knowledge and offer technical assistance to researchers and colleagues
- Deliver both technical and non-technical updates in meetings and at professional gatherings
- Engage effectively with team leaders, peers, end-users, and support staff as needed
- Master's Level Degree
- Experience developing MLOps pipelines for computer vision AI models
- Hands on experience developing custom machine learning algorithms from scratch (e.g., in NumPy or PyTorch)
- Designed and implemented shared machine learning service that is used across multiple teams or production projects
- Led the development of systems that automate the deployment and maintenance of multiple machine learning models into user-facing products
- Five years of industry experience in data science, with at least 3 of those years as a Senior Machine Learning Engineer