Job Title: AI Data Engineer
Location: Rockville, MD (Hybrid)
Duration: Long Term
Role Overview
We are seeking an AI Data Engineer to design, build, and optimize data pipelines and retrieval systems for a Generative AI platform. This role focuses on ingesting, transforming, and indexing domain-specific data to enable accurate, context-aware AI responses.
You will collaborate closely with AI/Agent developers and platform engineering teams to improve retrieval quality and expand knowledge coverage.
Key Responsibilities
1. Data Engineering & ETL
Design and develop scalable ETL pipelines for structured and unstructured data
Build workflows for document parsing, transformation, and large-scale ingestion
Implement data validation and quality checks to ensure accuracy and completeness
Utilize AWS services such as S3, Lambda, Step Functions, OpenSearch, and Bedrock
2. RAG Pipeline Development & Search Optimization
Architect and optimize Retrieval-Augmented Generation (RAG) pipelines
Define document chunking strategies and generate vector embeddings
Enhance retrieval quality through ranking, filtering, and hybrid search techniques
Evaluate system performance using retrieval accuracy metrics and benchmarks
Experiment with embedding models and search strategies to improve response relevance
3. Quality Engineering & Testing
Design test strategies for validating data pipelines and ingestion workflows
Develop automated regression tests to monitor retrieval performance
Build evaluation frameworks to measure precision, recall, and relevance
Promote Test-Driven Development (TDD) practices
4. Generative AI & Innovation
Stay current with advancements in RAG, embeddings, and retrieval systems
Explore techniques such as hybrid search, reranking, and contextual retrieval
Collaborate with AI teams to ensure high-quality, contextually relevant outputs
5. Security & Compliance
Follow secure coding practices, especially for sensitive and PII data
Ensure compliance with organizational security standards and policies
Participate in threat modeling and secure system design discussions
Required Skills & Qualifications
Strong experience in data engineering and ETL pipeline development
Hands-on experience with AWS services (S3, Lambda, Step Functions, OpenSearch)
Experience with RAG pipelines, vector databases, and embeddings
Proficiency in Python and data processing frameworks
Understanding of search optimization, ranking, and retrieval techniques
Experience with data validation, testing frameworks, and TDD
Familiarity with Generative AI platforms and LLM integrations
Nice to Have
Experience with Amazon Bedrock or similar LLM platforms
Knowledge of hybrid search and reranking techniques
Exposure to MLOps or AI platform engineering
Experience handling PII and secure data systems