UT MD Anderson is a leading cancer center focused on innovative healthcare solutions, and they are seeking a Senior Data Engineer specializing in Healthcare AI. The role involves architecting and building data infrastructure for AI/ML applications to improve patient outcomes and operational efficiency.
Responsibilities:
- Design, implement, and maintain batch and streaming pipelines for ML training, deployment, inference, and monitoring using Azure, Dataiku, and open-source tools
- Deploy and manage raw data, feature, and vector stores to enable fast, reliable access for production AI/ML systems
- Use Infrastructure-as-Code (IaC) and CI/CD workflows to automate deployments, improving reliability and efficiency
- Implement validation, lineage, anomaly detection, and drift monitoring to deliver accurate, compliant data
- Enforce encryption, RBAC, tokenization, and audit logging to ensure HIPAA/HITRUST compliance while enabling scalable AI operations
- Partner with data engineers, ML engineers, data scientists, and clinical stakeholders to deliver scalable AI solutions
- Mentor team members and drive best practices in data engineering
- Manage pipelines and infrastructure end-to-end, including monitoring, alerting, incident management, and continuous improvement
- Perform additional tasks as assigned to support departmental goals
Requirements:
- Bachelor's degree
- Five years of relevant information technology experience. May substitute required education with years of related experience on a one-to-one basis. With preferred degree, three years of experience required
- Expert in Python, SQL, Spark, and modern data engineering frameworks; proficient in Azure services, IaC tools (Terraform, Bicep), and CI/CD workflows
- Experienced in designing and managing feature and vector stores, batch and streaming pipelines, and high-throughput data architectures for AI/ML systems
- Familiar with HL7, FHIR, DICOM standards and skilled in handling EHR, imaging, and clinical datasets with de-identification and compliance
- Strong understanding of HIPAA/HITRUST requirements and ability to implement encryption, RBAC, and audit logging
- Capable of mentoring team members, driving best practices, and partnering with clinicians, data scientists, and IT teams to deliver impactful solutions
- Adept at troubleshooting complex data challenges, optimizing performance, and exploring emerging technologies for scalable AI operations
- Able to clearly document processes and present technical concepts to both technical and non-technical audiences
- Master's Level Degree
- Must obtain at least one Epic Data Model certification (Clinical, Access, or Revenue) issued by Epic within 180 days of date of entry into job
- Any of the following: Azure Data Engineer Associate (DP-203), EPIC Cogito Certification, HIPAA Privacy & Security Certification, HL7/FHIR Certification
- Healthcare experience in AI/ML space is a must, two years of industry experience in a Senior Data Scientist role, knowledge of data privacy, security, and HIPAA compliance in healthcare