Syneos Health® is a leading fully integrated biopharmaceutical solutions organization built to accelerate customer success. The Sr. Biometrics Data Engineer will act as a technical lead to define architecture, code, deploy, and maintain scalable ETL pipelines while managing complex datasets for research integration.
Responsibilities:
- Act as a hands-on technical lead who not only defines the architecture but also codes, deploys, and maintains scalable ETL pipelines and data structures
- Spearhead the technical implementation of the Translational Data Lake data ingestion, managing the ingestion of complex datasets (genomics, proteomics, imaging, lab data, etc.) into modern cloud architectures
- Broader Research Integration: Lead data engineering projects beyond the Data Lake, designing bespoke integration solutions for diverse scientific data sources across the Research organization
- Data Transformation: Design and script automated procedures to normalize unformatted data from external vendors (CROs) into a structured Common Data Model (CDM)
- Technical Collaboration: Partner with various functions in Research and IT to align infrastructure with scientific needs, ensuring solutions are robust, FAIR-compliant, and scalable
- Develop and communicate the technical vision for biomarker data integration and reuse
- Architect and implement scalable ETL procedures, APIs and front-end tools for data access and visualization
- Engage stakeholders to gather requirements and incorporate feedback into design
- Lead user acceptance testing (UAT) and ensure high-quality deliverables
- Collaborate with IT and Translational leads to align infrastructure and governance processes
- Champion FAIR principles and interoperability across translational and clinical programs
Requirements:
- Education: Bachelor's or master's degree in computer science, Data Engineering, Bioinformatics, or related field
- Experience: 8+ years of professional experience in data engineering or software architecture, with a focus on building production-grade data pipelines
- Expert-level coding proficiency in Python with specific mastery of modern data engineering libraries (Pandas, PySpark, Dask, SQLAlchemy)
- Advanced proficiency with SQL, workflow orchestration tools (Airflow, Dagster, or Prefect), and containerization (Docker/Kubernetes)
- Cloud Architecture: Deep experience with modern Data Lake and Lakehouse architectures (e.g., Azure Fabric, Databricks, Snowflake), with a proven track record of connecting and integrating disparate data sources
- Data Modeling: Solid understanding of data modeling, ETL processes, and schema design for complex datasets
- API Development: Experience designing and deploying APIs for data access
- Excellent communication skills to bridge the gap between IT infrastructure and scientific stakeholders
- Familiarity with FAIR principles and metadata standards for scientific data
- Excellent communication and collaboration skills
- Familiarity with clinical data standards including SDTM, ADaM, and CDISC, and biomarker data formats (NGS variant results, flow cytometry, serum proteomics, gene expression profiling)
- Direct experience with Azure Fabric tools for connecting and integrating data sources
- Proficiency in R for interoperability with bioinformatics teams