Design, build, maintain, and optimize scalable, secure, and resilient data pipelines using Spark, Databricks, Delta Lake, and AWS
Support data flow across clinical systems and build transformations aligned with CDISC standards
Implement frameworks for data quality, testing, monitoring, and performance optimization; perform clinical data cleaning, reconciliation, validation, and QC
Partner with Platform, Cloud, and DevOps teams to evolve the clinical data platform; own CI/CD automation, infrastructure as code, observability, lineage, and pipelines monitoring
Lead complex cross‑functional data engineering initiatives, serving as a senior engineering leader and mentoring engineers
Integrate data across R&D lifecycle domains, including API‑based ingestion and metadata management
Collaborate with cross‑functional teams, CROs, and external vendors, fostering an open, respectful, and inclusive team culture
Engage with governance, compliance, and security to ensure alignment with GxP, CSV, FAIR, privacy, and data ethics
Build and maintain data models, specifications, mapping documents, and QC documentation, contributing to enterprise data strategies and architecture roadmaps
Requirements
Bachelor’s or master’s degree in computer science, engineering, a related field, or equivalent practical experience
6 to 10+ years of data engineering experience with production‑grade data pipelines and large‑scale distributed systems
2 to 6+ years of experience with Databricks and AWS, plus ETL/ELT tools and cloud data lake/warehouse solutions
Experience in biotech/pharma regulated or complex scientific environments, including GxP/CSV and data governance (privacy, data ethics)
Strong SQL, PySpark, and Python skills; proficiency with SAS and R; experience with CI/CD, DevOps (GitHub), infra‑automation (Terraform), data modeling, metadata management, and data quality frameworks
Hands‑on experience with clinical data standards and systems and data masking/blinded-unblinded workflows
Familiarity with ML/AI workloads and model‑ready data engineering; experience across translational science, clinical development, safety, regulatory, and real‑world evidence domains
Experience working with CROs and external vendors; strong analytical, problem‑solving, communication, and cross‑functional partnership skills
A customer‑oriented and agile mindset, with the ability to manage competing priorities and work effectively with people from diverse disciplines, cultures, and backgrounds.
Tech Stack
AWS
Cloud
Distributed Systems
ETL
PySpark
Python
Spark
SQL
Terraform
Benefits
Competitive remuneration packages determined by specific role and location
Varied benefits in support of our diverse employee base