Design and implement data models, schemas, and ontologies for chemical, biological, and automation-generated data that serve discovery workflows across the portfolio
Define and maintain controlled vocabularies, metadata standards, and FAIR-compliant data frameworks in partnership with Preparedness4Insight
Implement semantic data standards (RDF, OWL, SPARQL) and ontology engineering practices to create interoperable, machine-readable scientific data
Design and implement data lakehouse architecture using modern platforms (Databricks, Snowflake, or equivalent), including data storage patterns, partitioning strategies, and query optimization
Build and optimize ETL/ELT pipelines using Spark, dbt, or similar tools to transform raw scientific data into analytical and ML-ready formats
Implement real-time and streaming data integration (Kafka, Kinesis, event-driven patterns) connecting LIMS, instruments, and lab automation systems to the data infrastructure
Design and implement knowledge graphs (Neo4j, Amazon Neptune, TigerGraph) that capture molecular, target, pathway, and experimental relationships across the discovery landscape
Architect specialized data solutions: array databases (TileDB) for genomics/imaging, document stores (MongoDB) for experimental records, and vector databases for embedding-based retrieval supporting ML and RAG workflows
Build query and traversal patterns that enable scientists and AI agents to ask relational questions across the entire data landscape
Partner with scientific software engineers to ensure data architectures are implementable, performant, and well-documented
Collaborate with Methods4Insight to design data structures that support analytical model training, deployment, and evaluation
Work with Tech@Lilly to define scaling strategies, ensure enterprise compliance, and transition data architectures to production-grade management
Contribute to build-versus-buy-versus-adopt decisions by evaluating commercial and open-source data platforms against Data Foundry requirements
Requirements
M.S. or PhD in Computer Science, Data Science, Bioinformatics, Computational Biology, Information Science, or related STEM field
6–12+ years of data architecture, data engineering, or scientific informatics experience
Deep expertise in at least one of the focus areas: relational databases, data modeling and ontology engineering, data platform and lakehouse architecture (Databricks, Snowflake, Spark), or knowledge graph and specialized database systems (Neo4j, Neptune, MongoDB, TileDB)
Tech Stack
ETL
Kafka
MongoDB
Neo4j
Spark
Benefits
participation in a company-sponsored 401(k)
pension
vacation benefits
eligibility for medical, dental, vision and prescription drug benefits
flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts)
life insurance and death benefits
certain time off and leave of absence benefits
well-being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities)