Tek Leaders Inc is a company specializing in data engineering solutions, and they are seeking a Senior AI Data Engineer to design and build scalable data pipelines for AI agents. The role involves creating data models, integrating data sources, and ensuring data quality and governance across various platforms.
Responsibilities:
- Design and build scalable data pipelines for AI agents across cloud platforms
- Create and maintain agent‑ready data models, schemas, and data contracts
- Build and operate vector data pipelines (data prep, chunking, embeddings, indexing, re‑indexing)
- Integrate structured, semi‑structured, and unstructured data sources for agent consumption
- Develop MCP (Model Context Protocol) data adapters/connectors for databases, APIs, SaaS, files, and streams
- Define standard MCP request/response schemas and transformation logic
- Integrate MCPs with the MCP gateway (auth, routing, throttling, observability)
- Build CI/CD pipelines for MCP build, test, deployment, and rollback
- Implement CI/CD pipelines for data pipelines, datasets, and vector stores
- Automate environment promotion (dev/test/prod) for data assets
- Embed data quality checks (schema validation, freshness, completeness) into pipelines
- Design and operate real‑time streaming pipelines (event ingestion, enrichment, aggregation)
- Enable event‑driven data triggers for AI agents
- Build batch + streaming hybrid architectures for historical and real‑time context
- Develop and maintain certified data connectors for Low‑Code / No‑Code platforms
- Standardize enterprise data models for reuse by agents and citizen developers
- Manage secure data access using RBAC, managed identities, secrets, and tokenization
- Monitor data quality, drift, and freshness impacting agent behavior
- Implement data observability and lineage tracking across pipelines and MCPs
- Enforce data governance, classification, and compliance controls
- Optimize data performance, latency, and cost for agent workloads
- Experience developing these using AWS cloud services and open source
Requirements:
- Design and build scalable data pipelines for AI agents across cloud platforms
- Create and maintain agent‑ready data models, schemas, and data contracts
- Build and operate vector data pipelines (data prep, chunking, embeddings, indexing, re‑indexing)
- Integrate structured, semi‑structured, and unstructured data sources for agent consumption
- Develop MCP (Model Context Protocol) data adapters/connectors for databases, APIs, SaaS, files, and streams
- Define standard MCP request/response schemas and transformation logic
- Integrate MCPs with the MCP gateway (auth, routing, throttling, observability)
- Build CI/CD pipelines for MCP build, test, deployment, and rollback
- Implement CI/CD pipelines for data pipelines, datasets, and vector stores
- Automate environment promotion (dev/test/prod) for data assets
- Embed data quality checks (schema validation, freshness, completeness) into pipelines
- Design and operate real‑time streaming pipelines (event ingestion, enrichment, aggregation)
- Enable event‑driven data triggers for AI agents
- Build batch + streaming hybrid architectures for historical and real‑time context
- Develop and maintain certified data connectors for Low‑Code / No‑Code platforms
- Standardize enterprise data models for reuse by agents and citizen developers
- Manage secure data access using RBAC, managed identities, secrets, and tokenization
- Monitor data quality, drift, and freshness impacting agent behavior
- Implement data observability and lineage tracking across pipelines and MCPs
- Enforce data governance, classification, and compliance controls
- Optimize data performance, latency, and cost for agent workloads
- Experience developing these using AWS cloud services and open source