BayOne Solutions is seeking an Agentic Data Engineer to build and maintain an agentic data ingestion pipeline. The role involves cleaning and organizing data, validating cross-modal linkages, and collaborating with various teams to establish data standards.
Responsibilities:
- Build an agentic data ingestion pipeline
- Triage and prioritize incoming requests to ingest specific datasets
- Clean and organize the data. Build the first pass cleaning and organization steps into the agentic flow
- Validate cross-modal linkage. Add automated checks that catch when ingested data does not connect correctly and flag low quality or mismatched records
- Version every dataset. Retain and make prior versions addressable
- Preserve raw data and provenance. Make agent workflows log validation and transformation steps so lineage is traceable
- Make agents usable across teams. Move beyond bespoke steps towards agents that teams can reliably use as a shared, deployed service
- Collaborate with AI, software engineering, and computational biology groups to co-define data standards and conventions
Requirements:
- Agentic AI engineering: Demonstrated experience building multi-agent workflows or LLM workflows using tools/frameworks such as LangGraph or LlamaIndex, including tool/function calling and asynchronous task execution
- Python data engineering: Strong Python for data manipulation, working with APIs and databases, and handling heterogeneous data formats
- Data versioning and provenance: Familiarity with dataset versioning approaches (e.g. DVC, lakeFS, or equivalent)
- Working knowledge of scientific data structures: Comfortable or willingness to learn common omics data formats like AnnData, H5AD, TileDB
- Basic understanding of omics: No deep bioinformatics expertise required; just a basic understanding of different modalities (e.g. what is RNA-seq vs scRNA-seq vs WES; genomics vs transcriptomics vs proteomics vs metabolomics)
- Unit testing: Comfortable writing unit and functional tests to ensure data processing workflows are reliable and reproducible
- Education: Degree in a technical field or equivalent practical experience
- Experience deploying agent workflows as a shared service (e.g., FastAPI or MCP endpoints)
- Exposure to cloud (AWS, GCP) and containerization (Docker)
- Familiarity with workflow managers such as Nextflow or Snakemake