Partner with scientific, ML, and product stakeholders to define a data roadmap: which datasets move the needle, which should be refreshed, and what “good enough” looks like for each use case. Establish clear success metrics for onboarding speed, dataset quality, and downstream usability (e.g., fewer training/data failures, higher match rates, better coverage, higher-confidence labels).
Proactively scout and integrate public and client datasets, plus relevant literature and reference materials, to keep our corpora current and comprehensive. Design a repeatable dataset intake workflow including provenance, source tracking, and refresh cadence.
Define curation standards that make data consistent across sources and modalities, including compound identity management, biological/sample metadata standardization, and schema + conventions mappings. Build a scalable approach to integrating metabolomics now and expanding to additional omics without reinventing everything each time. Develop practical QC/QA frameworks that combine scientific judgment with repeatable checks.
Work closely with leadership in engineering, AI, product, and scientific discovery to align initiatives with company-wide goals. Use experience to keep initiatives moving smoothly. Translate ambiguous questions into crisp data requirements, priorities, and execution plans. Build trust across disciplines by being both scientifically rigorous and pragmatically execution oriented.
Requirements
6+ years of demonstrated experience owning scientific data work end-to-end (curation, standardization, QC, documentation, governance) in bioinformatics, cheminformatics, computational biology, scientific data engineering, or related roles
Ability to navigate complex chemical and biological datasets, reconcile identifiers/metadata across sources, and make data consistently usable for end users.
Strong attention to detail with a keen ability to balance priorities and delivery incremental value while operating with minimal oversight.
Comfortable building structure from scratch: you can define processes, set standards, and iterate toward scalable practices in an early-stage environment.
Practical proficiency in Python and SQL for data investigation, transformation, QC, and automation.
Familiarity with modern data workflows (structured + semi-structured data, pipelines, reproducibility, documentation).
Experience with chemical structure representations and normalization (e.g., SMILES/InChI, canonicalization, salt/tautomer handling, stereochemistry considerations).
Demonstrated ability to communicate and collaborate with product, machine learning, applied science and engineers while reducing complex business questions into valuable, reliable technical solutions.
A passion for contributing to an early-stage startup where autonomy, eagerness to learn, and enthusiasm for solving novel scientific challenges prevail over rigid processes and egos.
Tech Stack
Python
SaltStack
SQL
Benefits
health & dental
vision
long
and short-term disability
life insurance
401k with company match
flexible work & unlimited time away policy
commuter benefits and parking
regular team meals and outings
company support for continued education/coursework and conference participation