Spring House, New Jersey, United States of America
Full Time
2 hours ago
$117,000 - $201,250 USD
Key skills
AzureDockerGraphQLJenkinsNeo4jSQLRAINLPNatural Language ProcessingRAGAzure DevOpsGitGitLabCI/CDStakeholder Management
About this role
Role Overview
Be a key contributor to the design and implementation of a scalable knowledge graph infrastructure focused on data standardization and interoperability, focusing on Oncology R&D data.
Apply graph-based data modeling for efficient Oncology R&D organization, integration and retrieval to ensure system flexibility and long-term maintainability.
Work with a larger community of Data Scientists, Clinical Scientists, and Discovery Scientists to standardize, curate and create AI-Ready datasets.
Curate and extend ontologies for clear mapping into established biomedical ontologies and controlled terminologies using resource description framework (RDF) standards.
Work with SPARQL/GraphQL/REST services; develop ingestion and curation pipelines to ingest, normalize and map concepts across data sources.
Extend and curate Oncology R&D-relevant ontologies (e.g., diseases, drugs, targets, pathways, etc.) and maintain synonyms, cross-references, and provenance.
Partner with cross-functional teams to enable NLP/RAG over graphs, features for predictive modeling and terminology services for search and study design tools.
Work with Data Science & Digital Health colleagues, IT and DevOps teams to deploy and manage the graph database infrastructure, focusing on high availability, scalability, and recovery operations specifically geared toward Oncology R&D needs and applications.
Draft and manage documentation, such as data dictionaries, data lineage, and data flow diagrams, to facilitate understanding of the knowledge graph.
Requirements
Desired Ph.D. or Master's degree in bioengineering, computer science, IT, bioinformatics, physics, mathematics, or related fields, emphasis on semantic technologies for biomedical application.
5+ years professional experience in health informatics.
Demonstrated experience in large-scale knowledge graphs construction, ontology development, pharmaceutical or healthcare domains integration.
Programming background in parser combinators, natural language processing, and linked data (RDF Triple Stores and property graphs).
Proficiency in semantic web technologies (e.g. SPARQL, RDF, OWL), familiarity with graph databases (Neo4j, Amazon Neptune).
Proven work with complex biomedical datasets (e.g. clinical, genomics, proteomics).
Proficiency in various data storage solutions (SQL, key-value, column, document, graph stores) and data modeling techniques (semantic data, ontologies, taxonomies).
Experience in CI/CD implementations, git usage, CI/CD stacks (Jenkins, GitLab, Azure DevOps), DevOps tools, metrics/monitoring, and containerization technologies (Docker, Singularity).
Demonstrated stakeholder management capabilities
including requirements gathering, business analysis and planning.
Must have the capacity to translate discussions into user requirements and project plans.
Ability to manage a numerous projects simultaneously, prioritize work, exhibit organizational skills and flexibility to deliver maximum business value.
Willingness to conduct periodic travel (<15% of time) to conferences and internal meetings.
Tech Stack
Azure
Docker
GraphQL
Jenkins
Neo4j
SQL
Benefits
medical
dental
vision
life insurance
short
and long-term disability
business accident insurance
group legal insurance
consolidated retirement plan (pension)
savings plan (401(k))
Vacation – up to 120 hours per calendar year
Sick time
up to 40 hours per calendar year
Holiday pay, including Floating Holidays – up to 13 days per calendar year