Emory University is a leading research university that fosters excellence and attracts world-class talent to innovate today and prepare leaders for the future. The Lead Data Engineer will work as part of a project team to deliver quality applications and manage complex data sets while ensuring compliance with regulations. Responsibilities include leading technical teams, developing reporting infrastructure, and promoting emerging technologies.
Responsibilities:
- Works as a positive team member of a project that may consist of Business Analysts, Project Managers, Information Architects, Data Analysts, and/or Database Administrators to deliver quality applications and components within scope, on time, within budget, and ensures compliance
- Manages workload effectively and report status of tasks in a timely manner. Is able to guide stakeholders on solutions and support structure. Ensures proper governance of prioritizing work from partners
- Forms tactical strategies to support multiple clients
- Leads and proactively identifies emerging technologies; develops proof-of-concepts, and promotes the usage of these emerging technologies
- Leads small to medium size technical teams, and mentors other analysts on regulation adherence
- Follows standard operational procedures and HIPAA regulations
- Develops strategies for managing complex data sets through maintaining data standards and metadata
- Applies biomedical informatics technical standards, methodologies, and principles to research-specific program needs, objectives, and outcomes
- Develops reporting infrastructure to meet multiple client needs
- Generates standard templates for architecture documentation related to current and proposed informatic solutions
- Performs other related duties as required
Requirements:
- A bachelor's degree in a related field and seven years of related experience, OR an equivalent combination of education, training, and experience
- Hands-on experience with Snowflake or similar cloud data warehouse platforms
- Experience with dbt (data build tool) for data transformations
- Experience with Apache Airflow or similar workflow orchestration tools
- Proficiency with GitHub in a CI/CD environment, including pull requests, code reviews, and automated testing
- 5+ years experience with AWS cloud solutions (S3, IAM, infrastructure)
- Strong SQL skills and experience with RDBMS (Oracle, SQL Server, MySQL, PostgreSQL)
- Python or similar scripting language proficiency
- Experience designing and implementing large-scale data warehouses and ETL/ELT pipelines
- Understanding of relational and non-relational data models with direct modeling experience
- Experience with script optimization and automating data solutions
- Knowledge of Linux/Unix commands and shell scripting
- Experience with OMOP Common Data Model or similar standardized healthcare data models (PCORnet, FHIR, i2b2)
- Demonstrated experience with complex data mapping and transformation to standardized schemas
- Experience implementing role-based access control (RBAC) with complex permission structures
- Understanding of object-level sharing and data access governance in multi-user environments
- 5+ years in a healthcare or university research setting
- Experience working with EMR data (Epic, Cerner) is a plus
- Experience gathering requirements and designing solutions in partnership with stakeholders and senior leadership
- Ability to articulate complex technical concepts to non-technical audiences
- Experience collaborating with subject matter expert data engineers in specialized domains
- Experience with advanced processing techniques such as natural language processing and machine learning