Mayo Clinic is a renowned healthcare organization, and they are seeking a skilled Data Engineer to join their Advanced Data Lake team. This role is critical for building and optimizing data solutions that support innovation, analytics, and machine learning across the enterprise.
Responsibilities:
- Design, develop, and maintain data pipelines for ingestion, transformation, and integration of large-scale datasets
- Build and optimize data models to support advanced analytics and machine learning applications
- Collaborate with product teams and stakeholders to deliver data solutions aligned with ADL architecture
- Ensure data quality, security, and compliance throughout all processes
- Implement automation and monitoring for data workflows to improve reliability and performance
- Support cloud-based data platforms, primarily Google Cloud Platform (GCP), and integrate with enterprise systems
- Assemble large, complex data sets that meet functional / non-functional business requirements
- Build processes supporting data transformation, data structures, metadata, dependency and workload management
- Apply a strong knowledge of SQL to efficiently process large volumes of data and troubleshoot SQL queries
- Proactively identify improvement opportunities (error detection, error correction, root cause analysis)
- Build automation to aid in verification and testing of data
- Effectively participate in multiple, concurrent projects
- Research new and existing data sources in order to contribute to new development, improve data management processes, and make recommendations for data quality initiatives
- Perform periodic data quality reviews for internal and external data
- Ensure timely resolution of queries and data issues
- Work with data and analytics experts to strive for greater functionality in our data systems
Requirements:
- Bachelor's degree in Computer Science, Engineering, or related field from an accredited University or College; OR an Associate's degree in Computer Science, Engineering, or related field from an accredited University or College with 2 years of experience
- Demonstrated ability to analyze and profile data as a means to address various business problems through leveraging advanced data modeling, source system databases, or data mining techniques
- Demonstrated application of several problem-solving methodologies, planning techniques, continuous improvement methods, and analytical tools and methodologies (e.g. data analysis, data profiling, modeling, etc.) required
- Ability to manage a varied workload of projects with multiple priorities and stay current on healthcare trends and enterprise changes
- Interpersonal skills and time management skills are required
- Strong analytical skills and the ability to identify and recommend solutions
- Advanced computer application skills and a commitment to customer service
- Experience with data analysis, quality, and profiling; including data exploration tools including but not limited to Rapid SQL, AQT, Information Analyzer, and Informatics
- Strong experience with Google Cloud Platform (BigQuery, Dataflow, Pub/Sub); Azure knowledge is a plus
- Familiarity with ETL tools, orchestration frameworks, and CI/CD pipelines
- Proficiency in Python, SQL, and other open-source programming languages
- Experience using Terraform to manage infrastructure as code
- GCP Professional Data Engineer certification preferred
- Building scalable data solutions for analytics and machine learning
- Working with structured and unstructured data in enterprise environments
- Strong problem-solving and communication skills
- Ability to work in cross-functional teams and manage multiple priorities