Mayo Clinic is a top-ranked healthcare provider dedicated to putting patient needs first. They are seeking a Principal Data Engineer to develop and deploy data solutions that support analytics and machine learning applications, while providing technical leadership and consultative services across the organization.

Responsibilities:

Develops and deploys data pipelines, integrations and transformations to support analytics and machine learning applications and solutions as part of an assigned product team
Maintaining an understanding of the organization's current solutions, coding languages, tools, and regularly requires the application of independent judgment
Provide consultative services to departments/divisions and leadership committees
Demonstrated experience designing, building, and operating large-scale healthcare data platforms and data ecosystems
Partner with product owners, clinical stakeholders and AI/ML experts to identify and retrieve data, conduct exploratory analysis, pipeline and transform data to support the creation of agentic systems and the build of state-of-the-art multi-modal foundation models
Provide technical leadership in architecting scalable, cost-efficient data solutions, optimizing data movement and storage strategies, and ensuring secure, compliant access to healthcare data assets across hybrid and multi-cloud environments

Requirements:

A Bachelor's degree in a relevant field such as engineering, mathematics, computer science, information technology, health science, or other analytical/quantitative field and a minimum of seven years of professional or research experience in data visualization, data engineering, analytical modeling techniques; OR an Associate's degree in a relevant field such as engineering, mathematics, computer science, information technology, health science, or other analytical/quantitative field and a minimum of nine years of professional or research experience in data visualization, data engineering, analytical modeling techniques
In-depth business or practice knowledge will also be considered
Incumbent must have the ability to manage a varied workload of projects with multiple priorities and stay current on healthcare trends and enterprise changes
Interpersonal skills, time management skills, and demonstrated experience working on cross functional teams are required
Requires strong analytical skills and the ability to identify and recommend solutions and a commitment to customer service
The position requires excellent verbal and written communication skills, attention to detail, and a high capacity for learning and problem resolution
Advanced experience in SQL is required
Advanced Experience in scripting languages such as Python, JavaScript, PHP, C++ or Java & API integration is required
Experience in hybrid data processing methods (batch and streaming) such as Apache Spark, Hive, Pig, Kafka is required
Experience with big data, statistics, and machine learning is required
The ability to navigate linux and windows operating systems is required
Knowledge of workflow scheduling (Apache Airflow Google Composer), Infrastructure as code (Kubernetes, Docker) CI/CD (Jenkins, Github Actions) is required
Experience in DataOps/DevOps and agile methodologies is required
An advanced degree is preferred
Strong healthcare data knowledge including electronic health records (EHR), clinical, operational, imaging, genomic, and research data domains, as well as familiarity with healthcare interoperability standards such as HL7, FHIR, DICOM, OMOP, and related healthcare data models
Demonstrated experience designing and optimizing large-scale data movement, integration, and transformation solutions involving terabyte- to petabyte-scale datasets, with consideration for performance, scalability, reliability, and cost efficiency
Experience architecting and supporting hybrid data platforms spanning cloud and on-premises environments, including data residency, security, governance, and compliance requirements
Experience with multiple cloud platforms such as Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure, including cloud-native data engineering services and cross-cloud data integration patterns
Experience evaluating and optimizing data transfer, storage, and compute costs while meeting performance, availability, and service-level objectives
Knowledge of healthcare data governance, data quality frameworks, master data management, metadata management, and regulatory requirements including HIPAA and related healthcare privacy standards
Experience supporting AI/ML, generative AI, and foundation model initiatives through the development of scalable, high-quality data pipelines and data products
Demonstrated ability to provide technical leadership and architectural guidance for enterprise-scale data engineering initiatives

Principal Data Engineer - AI Program

Key skills

About this role

Responsibilities:

Requirements: