Huron is a consulting firm that helps healthcare organizations enhance performance and drive growth. They are seeking a Senior Data Engineer to lead the development of an AI context platform, focusing on building data capabilities and ensuring operational excellence while collaborating with cross-functional teams.
Responsibilities:
- Build and own the AI context platform
- Design and implement end-to-end pipelines: ingestion → parsing/chunking → enrichment → embeddings → vector indexing → retrieval/serving
- Build scalable patterns for incremental refresh, backfills, re-embeddings, deduplication, and lineage across unstructured sources
- Improve retrieval quality (query strategies, hybrid search, metadata filtering, reranking hooks) in partnership with AI engineers
- Define and implement semantic layers (metrics/entities) that power BI and agent reasoning consistently
- Establish data contracts and “context contracts” for AI inputs (schemas, metadata requirements, freshness, citation expectations)
- Ensure datasets and indexes are discoverable, documented, and reusable
- Own reliability and performance: monitoring, alerting, SLAs/SLOs, runbooks, incident response, postmortems
- Optimize cost and latency across warehouse/lakehouse and vector infrastructure
- Implement security-by-design: RBAC/ABAC patterns, PII redaction, retention controls, audit logging, and safe access pathways for agent tools
- Partner with Security/Legal/Compliance on guardrails for AI access to enterprise knowledge
- Drive technical direction and roadmap decomposition with product/AI/application stakeholders
- Set best practices for testing, CI/CD, and evaluation (retrieval eval sets, regression tests, online telemetry)
- Mentor engineers via pairing, code reviews, and lightweight enablement sessions
Requirements:
- 6-10+ years in data engineering/platform roles with significant hands-on delivery
- Expert SQL and strong Python (or Scala/Java); strong production engineering habits
- Proven experience designing cloud data pipelines and operating them reliably at scale
- Experience working with unstructured data processing and search/retrieval concepts
- Strong communication skills and ability to lead cross-functionally
- Hands-on experience with vector search and embeddings (pgvector/Pinecone/Weaviate/OpenSearch/Elastic) and retrieval patterns (semantic retrieval, hybrid search, reranking)
- Experience supporting LLM applications (RAG, agent tool interfaces, evaluation/observability)
- Knowledge of knowledge graphs/semantic modeling or metrics layers at scale
- Experience in regulated environments and mature governance programs