Design, develop and maintain data pipelines for acquiring, managing and storing Oncology R&D data from diverse sources (e.g. biomarker labs, real-world data sources, pre-clinical applications)
Work closely with Data Science and Oncology R&D partners to understand, document and prioritize business requirements. Translate these business needs into high quality data products.
Work closely with other technical leaders, such as Ontology and Knowledge graph Engineers to design and deliver future-proof, AI-ready data systems aligned with Oncology R&D business needs.
Develop Oncology R&D-specific data repositories by implementing standard enterprise-level data models and create new data models as needed.
Leverage cloud-based technology platform to accomplish goals, such as building and maintaining data repositories using AWS S3.
Create and optimize data flows for structured and unstructured data using technologies such as Python, R, SQL, AWS services and other relevant tools.
Implement quality and performance standards and measure KPIs to determine accuracy and consistency
Leverage and implement data versioning and lineage tracking to support data traceability, compliance, maintaining documentation for data architectures and workflows.
In adherence to internal standards, implement software development best practices such as Code Versioning, DevOps.
Requirements
Advanced degree (Master’s or equivalent) in Computer Science, Engineering, Life Sciences, or other relevant field is strongly preferred.
3+ years of experience in data engineering, including data modeling and database design, preferably in the healthcare industry
Proficiency in data engineering tools such as Python, R and SQL for data processing as well as cloud architecture (e.g. AWS services, Redshift, FSx, Glue, Lambda).
Experience with unstructured database technologies (e.g. NoSQL) as well as other database types (e.g. Graph).
Strong skills in analysis, problem-solving, organizational change, project delivery, and managing external vendors.
Proven record leading improvement initiatives with multi-disciplinary and remote partners.
Demonstrated stakeholder management capabilities
including requirements gathering, business analysis and planning.
Must have the capacity to translate discussions into user requirements and project plans.
Ability to manage a numerous projects simultaneously, prioritize work, exhibit organizational skills and flexibility to deliver maximum business value.
Willingness to conduct periodic travel (<15% of time) to conferences and internal meetings.
Tech Stack
Amazon Redshift
AWS
Cloud
NoSQL
Python
SQL
Benefits
medical
dental
vision
life insurance
short
and long-term disability
business accident insurance
group legal insurance
retirement plan (pension)
savings plan (401(k))
up to 120 hours vacation per calendar year
up to 40 hours sick time per calendar year
up to 13 days holiday pay including floating holidays per calendar year
up to 40 hours work, personal and family time per calendar year
Principal Data Scientist, R&D Oncology at Johnson & Johnson | JobVerse