Huron is a consulting company that helps healthcare organizations innovate and improve performance. The Staff Data Engineer will lead the development of an AI context platform, focusing on building data capabilities and delivering data products to enhance operational excellence and governance.
Responsibilities:
- Build and own the AI context platform
- Design and implement end-to-end pipelines: ingestion → parsing/chunking → enrichment → embeddings → vector indexing → retrieval/serving
- Build scalable patterns for incremental refresh, backfills, re-embeddings, deduplication, and lineage across unstructured sources
- Improve retrieval quality (query strategies, hybrid search, metadata filtering, reranking hooks) in partnership with AI engineers
- Deliver semantic and governed data products
- Define and implement semantic layers (metrics/entities) that power BI and agent reasoning consistently
- Establish data contracts and “context contracts” for AI inputs (schemas, metadata requirements, freshness, citation expectations)
- Ensure datasets and indexes are discoverable, documented, and reusable
- Own reliability and performance: monitoring, alerting, SLAs/SLOs, runbooks, incident response, postmortems
- Optimize cost and latency across warehouse/lakehouse and vector infrastructure
- Implement security-by-design: RBAC/ABAC patterns, PII redaction, retention controls, audit logging, and safe access pathways for agent tools
- Partner with Security/Legal/Compliance on guardrails for AI access to enterprise knowledge
- Drive technical direction and roadmap decomposition with product/AI/application stakeholders
- Set best practices for testing, CI/CD, and evaluation (retrieval eval sets, regression tests, online telemetry)
- Mentor engineers via pairing, code reviews, and lightweight enablement sessions
Requirements:
- 6–10+ years in data engineering/platform roles with significant hands-on delivery
- Expert SQL and strong Python (or Scala/Java); strong production engineering habits
- Proven experience designing cloud data pipelines and operating them reliably at scale
- Experience working with unstructured data processing and search/retrieval concepts
- Strong communication skills and ability to lead cross-functionally
- Hands-on experience with vector search and embeddings (pgvector/Pinecone/Weaviate/OpenSearch/Elastic) and retrieval patterns (semantic retrieval, hybrid search, reranking)
- Experience supporting LLM applications (RAG, agent tool interfaces, evaluation/observability)
- Knowledge of knowledge graphs/semantic modeling or metrics layers at scale
- Experience in regulated environments and mature governance programs