Define the target-state Data & AI Foundations architecture supporting agentic AI use cases, including RAG pipelines, enterprise knowledge graph or metadata layer, data products, and AI-ready datasets.
Own the strategy and roadmap for making key enterprise data sources 'AI-ready': curation, quality, metadata, access patterns, latency requirements, and retention.
Partner with source system owners (core servicing, CRM, collections, risk, fraud, etc.) to define data contracts, SLAs, and integration patterns that support downstream RAG and analytics.
Design and govern canonical data models and semantic layers used by RAG pipelines, memory stores, and analytics to ensure consistency across agents and domains.
Lead the design of RAG data infrastructure on cloud (e.g., PostgreSQL, Redshift, vector stores, object storage) and ensure it aligns with performance, cost, and compliance constraints.
Define and implement RAG evaluation strategies including retrieval quality metrics, ranking and re-ranking optimization, relevance scoring, and A/B testing frameworks for continuous improvement.
Establish data preparation and curation pipelines for model fine-tuning, including dataset selection, labeling strategies, quality validation, versioning, and compliance with model risk policies.
Design and optimize retrieval strategies for RAG systems: chunking approaches, embedding models, indexing techniques, ranking algorithms, re-ranking logic, and hybrid search patterns.
Build and maintain robust data pipelines (batch and streaming) that ingest, transform, enrich, and deliver data into RAG systems, vector stores, feature stores, and agent contexts with appropriate SLAs.
Collaborate with the Enterprise AI Platform team on how data services (RAG APIs, feature stores, metadata services) are exposed as platform primitives for agent builders.
Define and enforce data governance policies for AI: data classification, lineage, access controls, PII handling, retention, and usage logging for AI workloads.
Partner with AI Governance/Model Risk and InfoSec/AppSec to ensure data usage in prompts, context, and tools adheres to policies, including regulatory, privacy, and model risk requirements.
Establish data quality and observability practices for AI data: data SLAs, freshness, completeness, drift detection, and business rule validation tied to AI outcomes.
Drive adoption of metadata and catalog tools so platform and agent teams can discover, understand, and safely consume datasets and RAG endpoints.
Define and oversee patterns for integrating external data (third-party, public, partner data) into AI workflows, including licensing checks, quality assessment, and monitoring.
Perform other duties and/or special projects as assigned.
Requirements
Bachelor's degree in Computer Science, Engineering, Information Systems, or related field (or equivalent experience)
12+ years of experience across data engineering, data architecture, or analytics platforms, with at least 5+ years in cloud data platforms and enterprise data leadership roles in lieu of a degree
14+ years of experience across data engineering, data architecture, or analytics platforms, with at least 5+ years in cloud data platforms and enterprise data leadership roles.
Strong experience with modern cloud data stacks (e.g., data warehouses like Redshift/Snowflake/BigQuery, relational databases like PostgreSQL, and object storage) and their use in analytics and AI.
Hands-on experience with vector databases and search technologies (for example PostgreSQL pgvector, Pinecone, OpenSearch, or similar) to support RAG and semantic search workloads.
Demonstrated expertise in designing and governing data models, semantic layers, and data products that serve multiple consuming applications and analytics teams.
Hands-on experience designing or supporting RAG architectures including chunking strategies, embedding pipelines, retrieval optimization, ranking/re-ranking, and evaluation frameworks.
Solid understanding of LLM and agentic AI patterns (prompts, tools, RAG, memory) and how data quality and structure impact AI behavior and performance.
Proven experience building data pipelines for AI/ML use cases including ETL/ELT workflows, streaming data integration, and data preparation for model training and fine-tuning.
Strong experience with Lakehouse architecture using S3, Apache Iceberg, Glue Data Catalog, Redshift
Strong Python skills for building data processing, evaluation, and automation pipelines, plus familiarity with DevOps practices (CI/CD, infrastructure as code, environment management).
Good understanding of enterprise data governance and access controls like AWS Lake Formation, Glue Data catalog and metadata management frameworks.
Good understanding of identity and data security architecture
IAM, IAM Identity Center, cross account data access patterns, identity propagation for AI agents and services
Good understanding of AWS infrastructure concepts (networking, security, storage, compute) and how they apply to data and AI workloads.
Experience working with ETL/ELT pipelines, streaming data, and integration technologies (e.g., CDC, APIs, event buses) for both batch and real-time use cases.
Proven ability to lead multi-disciplinary teams and influence across platform, AI, data, and business stakeholders.
Excellent communication and storytelling skills, with the ability to explain complex data/AI architecture decisions in business terms and secure buy-in at VP/SVP levels.
Tech Stack
Amazon Redshift
Apache
AWS
BigQuery
Cloud
ETL
Postgres
Python
Benefits
Best-in-class employee benefits and programs that cater to work-life integration and overall well-being