Huron is a company that helps healthcare organizations drive growth and improve clinical outcomes through innovation and technology. The Data Engineer role focuses on building and maintaining AI data capabilities to enhance the healthcare business, including creating a context platform for structured and unstructured data. The position involves collaborating with senior engineers and partners to deliver reliable AI data products.
Responsibilities:
- Build and contribute to the AI context platform Implement end-to-end pipelines: ingestion → parsing/chunking → enrichment → embeddings → vector indexing → retrieval/serving
- Build and maintain patterns for incremental refresh, backfills, re-embeddings, deduplication, and lineage across unstructured sources
- Contribute to retrieval quality improvements (query strategies, hybrid search, metadata filtering) in partnership with AI engineers
- Deliver semantic and governed data products Implement semantic layers (metrics/entities) that power BI and agent reasoning consistently
- Apply established data contracts and context contracts for AI inputs (schemas, metadata requirements, freshness, citation expectations)
- Ensure datasets and indexes are documented and reusable
- Support reliability and performance across assigned workstreams: monitoring, alerting, runbooks, and incident response
- Contribute to cost and latency optimization across warehouse/lakehouse and vector infrastructure
- Apply security-by-design patterns: RBAC/ABAC, PII redaction, retention controls, and audit logging
- Follow established guardrails for AI access to enterprise knowledge in coordination with Security/Legal/Compliance
Requirements:
- 3–6 years in data engineering or data platform roles with strong hands-on delivery
- Strong SQL and Python (or Scala/Java); solid production engineering habits
- Experience designing and operating cloud data pipelines at scale
- Experience working with unstructured data processing and search/retrieval concepts
- Clear communicator who can work effectively across technical and functional teams
- Hands-on experience with vector search and embeddings (pgvector/Pinecone/Weaviate/OpenSearch/Elastic) and retrieval patterns (semantic retrieval, hybrid search, reranking)
- Experience supporting LLM applications (RAG, agent tool interfaces, evaluation/observability)
- Familiarity with knowledge graphs/semantic modeling or metrics layers
- Experience in regulated environments and data governance programs