Design, develop, and maintain scalable RAG/CAG pipelines for AI-powered applications
Build and optimize document ingestion workflows for structured and unstructured data sources
Manage and maintain vector stores to support semantic search and retrieval capabilities
Develop OCR processing pipelines for historical and modern document collections spanning 1781–2025
Optimize retrieval performance, relevance tuning, and ranking strategies for LLM-based systems
Build reliable data pipelines that support integrations with large language models and AI services
Collaborate with engineers, UX teams, product owners, and stakeholders to deliver scalable AI solutions
Ensure data quality, integrity, security, and performance across ingestion and retrieval systems
Implement monitoring, logging, and troubleshooting for AI and data processing workflows
Contribute to architecture decisions, technical documentation, and engineering best practices
Participate in agile pod-based development teams and continuous improvement initiatives

4+ years of experience in data engineering, data platform development, or AI/ML infrastructure
Strong experience building RAG and/or CAG pipelines
Hands-on experience with vector databases and semantic retrieval systems
Experience developing document ingestion and OCR processing workflows
Strong understanding of LLM integrations and AI data pipeline architectures
Experience working with structured, semi-structured, and unstructured datasets
Proficiency with Python and modern data engineering frameworks
Familiarity with APIs, ETL/ELT pipelines, and distributed processing systems
Experience building and operating data pipelines in secure federal cloud environments, including FedRAMP Moderate and Zero Trust architectures, with appropriate handling of sensitive data and Controlled Unclassified Information (CUI)
Ability to obtain and maintain a federal Public Trust (or higher) clearance
Strong analytical, troubleshooting, and performance optimization skills
Ability to work effectively in agile or pod-based delivery environments
Excellent communication and collaboration skills

Data Engineer

Key skills