Jazz Pharmaceuticals is a global biopharmaceutical company dedicated to transforming the lives of patients through innovative medicine development. The Senior Principal Data Engineer will lead data engineering initiatives, overseeing the design and maintenance of data pipelines to support research and development efforts.
Responsibilities:
- Lead the design, development, and maintenance of scalable data pipelines that integrate diverse research data sources, leveraging AWS cloud technologies
- Create and optimize ETL/ELT processes for both structured and unstructured data using Python, R, SQL, and AWS services to ensure efficient data processing
- Build and manage data repositories utilizing AWS S3 and FSx, and establish data warehousing solutions with Amazon Redshift to support analytics and reporting
- Develop and maintain standard data models that facilitate data consistency and interoperability across research domains
- Implement and oversee data quality frameworks, validation procedures, and KPIs to ensure high data integrity and accuracy
- Establish data versioning and lineage tracking mechanisms to support compliance, audit readiness, and data traceability
- Maintain comprehensive documentation for data architectures, workflows, and processes to ensure transparency and reproducibility
- Apply modern software development best practices, including code versioning, DevOps, and continuous deployment (CI/CD) pipelines
- Collaborate with R&D researchers, data scientists, and stakeholders to understand data requirements and deliver tailored data solutions
- Ensure compliance with data privacy regulations such as HIPAA and GDPR, maintaining secure and ethical data handling practices
- Support data literacy initiatives by developing and delivering training sessions to enhance team capabilities
Requirements:
- Bachelor's Degree in Computer Science, Statistics, Mathematics, Life Sciences, or a related scientific field
- A minimum of 5-7 years of experience in data engineering, with at least 2 years focused on healthcare, research, or clinical data
- Expert knowledge of data engineering tools such as Python, R, and SQL
- Proficiency in AWS services including S3, Redshift, FSx, Glue, and Lambda
- Strong experience with relational databases, data modeling, and database design
- Familiarity with unstructured data technologies like NoSQL, Graph databases, and containerization tools such as Docker and EKS/Kubernetes
- Big data handling capabilities
- Master's Degree
- Knowledge of healthcare data standards such as CDISC, HL7, FHIR, SNOMED CT, OMOP, and DICOM
- Experience with MLOps, model deployment, and working within an Agile environment