Job Title: AI Data Engineer 
Location: Rockville, MD (Hybrid)
Duration: Long Term
 
 Role Overview 
We are seeking an AI Data Engineer to design, build, and optimize data pipelines and retrieval systems for a Generative AI platform. This role focuses on ingesting, transforming, and indexing domain-specific data to enable accurate, context-aware AI responses.

You will collaborate closely with AI/Agent developers and platform engineering teams to improve retrieval quality and expand knowledge coverage.
   
Key Responsibilities 
1. Data Engineering & ETL

Design and develop scalable ETL pipelines for structured and unstructured data

Build workflows for document parsing, transformation, and large-scale ingestion

Implement data validation and quality checks to ensure accuracy and completeness

Utilize AWS services such as S3, Lambda, Step Functions, OpenSearch, and Bedrock

  
2. RAG Pipeline Development & Search Optimization

Architect and optimize Retrieval-Augmented Generation (RAG) pipelines

Define document chunking strategies and generate vector embeddings

Enhance retrieval quality through ranking, filtering, and hybrid search techniques

Evaluate system performance using retrieval accuracy metrics and benchmarks

Experiment with embedding models and search strategies to improve response relevance

  
3. Quality Engineering & Testing

Design test strategies for validating data pipelines and ingestion workflows

Develop automated regression tests to monitor retrieval performance

Build evaluation frameworks to measure precision, recall, and relevance

Promote Test-Driven Development (TDD) practices

  
4. Generative AI & Innovation

Stay current with advancements in RAG, embeddings, and retrieval systems

Explore techniques such as hybrid search, reranking, and contextual retrieval

Collaborate with AI teams to ensure high-quality, contextually relevant outputs

  
5. Security & Compliance

Follow secure coding practices, especially for sensitive and PII data

Ensure compliance with organizational security standards and policies

Participate in threat modeling and secure system design discussions

  
Required Skills & Qualifications

Strong experience in data engineering and ETL pipeline development

Hands-on experience with AWS services (S3, Lambda, Step Functions, OpenSearch)

Experience with RAG pipelines, vector databases, and embeddings

Proficiency in Python and data processing frameworks

Understanding of search optimization, ranking, and retrieval techniques

Experience with data validation, testing frameworks, and TDD

Familiarity with Generative AI platforms and LLM integrations

Nice to Have

Experience with Amazon Bedrock or similar LLM platforms

Knowledge of hybrid search and reranking techniques

Exposure to MLOps or AI platform engineering

Experience handling PII and secure data systems

AI Data Engineer (Hybrid)

Key skills

About this role