Zeta Global is an AI-Powered Marketing Cloud company that focuses on leveraging advanced artificial intelligence to enhance customer acquisition and retention for marketers. They are seeking a Staff Data Engineer to lead the design and implementation of a unified semantic data layer that integrates various data sources, enabling high-performance data interaction for AI systems.
Responsibilities:
- Design and build a centralized semantic data layer using Cube Core (or equivalent technology such as Headless BI, dbt Metrics Layer, or Metriql) that provides a unified, governed abstraction over all company data sources
- Define semantic models, metrics, dimensions, and relationships that map to business domains across marketing, advertising, identity resolution, and customer analytics
- Expose the semantic layer via REST/GraphQL APIs and MCP-compatible tool interfaces purpose-built for consumption by AI agents and LLMs
- Integrate and unify data from heterogeneous systems including MySQL, DynamoDB, Aerospike, Snowflake, Amazon S3 (data lakes), Apache Kafka, Amazon SQS, and other internal data stores
- Build connectors, adapters, and federation layers to query across both operational (OLTP) and analytical (OLAP) data sources in a performant, cost-efficient manner
- Ensure seamless handling of both data at rest (warehouses, lakes, databases) and data in motion (streaming platforms, event buses, message queues)
- Design tool interfaces and API contracts that allow AI agents to discover available data, understand schema semantics, and generate accurate queries autonomously
- Collaborate with AI/ML teams to optimize the semantic layer for LLM-generated SQL, natural language querying, retrieval-augmented generation (RAG), and agentic workflows
- Implement guardrails, query validation, and cost controls to prevent runaway queries from AI-generated workloads
- Architect the semantic layer with native multi-tenant isolation, ensuring strict data segregation and tenant-scoped access controls
- Implement row-level security, column-level masking, and attribute-based access controls (ABAC) to enforce data governance policies
- Ensure compliance with SOC 2, GDPR, CCPA, and industry-specific regulations governing data access, PII handling, and cross-border data flows
- Design for horizontal scalability to support thousands of concurrent queries from AI agents, internal dashboards, and customer-facing products
- Implement intelligent caching (pre-aggregation, materialized views, query result caching) to deliver sub-second response times for common query patterns
- Build observability into the semantic layer with comprehensive metrics, logging, alerting, and query performance profiling
- Serve as the technical authority on data architecture decisions, authoring ADRs (Architecture Decision Records) and reference architectures
- Mentor and guide senior engineers on best practices for semantic modeling, data governance, and API design
- Partner cross-functionally with Product, Data Science, Platform Engineering, InfoSec, and Compliance teams to align the data layer with business objectives
Requirements:
- 10+ years of experience in data engineering, data architecture, or platform engineering, with at least 3 years operating at a Staff/Principal level
- Deep hands-on expertise with multiple data stores: relational (MySQL/PostgreSQL), NoSQL (DynamoDB, Aerospike, MongoDB), cloud data warehouses (Snowflake, BigQuery, Redshift), and data lakes (S3, Delta Lake, Iceberg)
- Strong experience with streaming and messaging systems: Apache Kafka, Amazon SQS/SNS, Kinesis, or equivalent
- Proven experience building or operating semantic/metrics layers using Cube.js/Cube Core, dbt Metrics, LookML, or similar technologies
- Expert-level SQL skills and experience with query optimization across distributed systems
- Production experience designing multi-tenant data platforms with strict security and isolation requirements
- Strong understanding of data governance, access control models (RBAC, ABAC), and compliance frameworks (SOC 2, GDPR, CCPA)
- Experience designing and exposing APIs (REST, GraphQL) for data consumption at scale
- BS/MS in Computer Science, Data Engineering, or equivalent practical experience
- Experience building data interfaces specifically for AI/ML consumption, including tool-use APIs for LLM agents, MCP (Model Context Protocol), or function-calling patterns
- Familiarity with AI orchestration frameworks (LangChain, LlamaIndex, Semantic Kernel) and how they interact with external data tools
- Experience with infrastructure-as-code (Terraform, Pulumi), container orchestration (Kubernetes, ECS), and CI/CD pipelines for data platform deployments
- Background in MarTech/AdTech data domains: identity graphs, audience segmentation, campaign analytics, attribution modeling, or real-time bidding data
- Contributions to open-source data tools or published thought leadership on semantic layers, data mesh, or AI-enabled data architectures