Design and maintain a robust semantic layer that translates raw database schemas into high-context metadata, allowing LLMs and autonomous agents to reason across enterprise data
Integrate ERPs, operational tools, and legacy systems into a clean, unified internal data layer that powers Skiffra’s orchestration engine
Design the schemas and data contracts consumed by LLMs and workflow engines to ensure predictable, high-fidelity inputs from varied, often "messy" sources
Ensure every data point carries the necessary lineage and metadata for an LLM to understand its business significance
Architect the end-to-end ingestion and normalization pipelines for structured, semi-structured, and unstructured data, transforming fragmented enterprise fragments into a high-fidelity stream for AI agents
Implement the monitoring, observability, and automated data quality gates necessary to ensure our orchestration engine doesn't act on stale or corrupted enterprise context
Partner closely with product and engineering to translate complex operational needs into scalable data systems
Operate with speed and rigor in environments where reliability matters
Requirements
7+ years building production-grade data engineering or backend systems
Strong Python and SQL mastery
Experience with ETL/ELT pipelines and API-based integrations
Experience with cloud data infrastructure and streaming or event-driven systems
Ability to work independently and make sound technical tradeoffs
Proven track record building systems where data discovery and cataloging were core features
Ability to make sound technical tradeoffs and thrive in "zero-to-one" environments without a roadmap or perfect documentation
Tech Stack
Cloud
ETL
Python
SQL
Benefits
health insurance
retirement plans
participation in the Company’s bonus and incentive programs