Zeta Global is an AI-Powered Marketing Cloud company that focuses on leveraging advanced artificial intelligence to enhance customer acquisition and retention for marketers. They are seeking a Staff Data Engineer to lead the design and implementation of a unified semantic data layer that integrates various data sources, enabling high-performance data interaction for AI systems.

Responsibilities:

Design and build a centralized semantic data layer using Cube Core (or equivalent technology such as Headless BI, dbt Metrics Layer, or Metriql) that provides a unified, governed abstraction over all company data sources
Define semantic models, metrics, dimensions, and relationships that map to business domains across marketing, advertising, identity resolution, and customer analytics
Expose the semantic layer via REST/GraphQL APIs and MCP-compatible tool interfaces purpose-built for consumption by AI agents and LLMs
Integrate and unify data from heterogeneous systems including MySQL, DynamoDB, Aerospike, Snowflake, Amazon S3 (data lakes), Apache Kafka, Amazon SQS, and other internal data stores
Build connectors, adapters, and federation layers to query across both operational (OLTP) and analytical (OLAP) data sources in a performant, cost-efficient manner
Ensure seamless handling of both data at rest (warehouses, lakes, databases) and data in motion (streaming platforms, event buses, message queues)
Design tool interfaces and API contracts that allow AI agents to discover available data, understand schema semantics, and generate accurate queries autonomously
Collaborate with AI/ML teams to optimize the semantic layer for LLM-generated SQL, natural language querying, retrieval-augmented generation (RAG), and agentic workflows
Implement guardrails, query validation, and cost controls to prevent runaway queries from AI-generated workloads
Architect the semantic layer with native multi-tenant isolation, ensuring strict data segregation and tenant-scoped access controls
Implement row-level security, column-level masking, and attribute-based access controls (ABAC) to enforce data governance policies
Ensure compliance with SOC 2, GDPR, CCPA, and industry-specific regulations governing data access, PII handling, and cross-border data flows
Design for horizontal scalability to support thousands of concurrent queries from AI agents, internal dashboards, and customer-facing products
Implement intelligent caching (pre-aggregation, materialized views, query result caching) to deliver sub-second response times for common query patterns
Build observability into the semantic layer with comprehensive metrics, logging, alerting, and query performance profiling
Serve as the technical authority on data architecture decisions, authoring ADRs (Architecture Decision Records) and reference architectures
Mentor and guide senior engineers on best practices for semantic modeling, data governance, and API design
Partner cross-functionally with Product, Data Science, Platform Engineering, InfoSec, and Compliance teams to align the data layer with business objectives

Requirements:

10+ years of experience in data engineering, data architecture, or platform engineering, with at least 3 years operating at a Staff/Principal level
Deep hands-on expertise with multiple data stores: relational (MySQL/PostgreSQL), NoSQL (DynamoDB, Aerospike, MongoDB), cloud data warehouses (Snowflake, BigQuery, Redshift), and data lakes (S3, Delta Lake, Iceberg)
Strong experience with streaming and messaging systems: Apache Kafka, Amazon SQS/SNS, Kinesis, or equivalent
Proven experience building or operating semantic/metrics layers using Cube.js/Cube Core, dbt Metrics, LookML, or similar technologies
Expert-level SQL skills and experience with query optimization across distributed systems
Production experience designing multi-tenant data platforms with strict security and isolation requirements
Strong understanding of data governance, access control models (RBAC, ABAC), and compliance frameworks (SOC 2, GDPR, CCPA)
Experience designing and exposing APIs (REST, GraphQL) for data consumption at scale
BS/MS in Computer Science, Data Engineering, or equivalent practical experience
Experience building data interfaces specifically for AI/ML consumption, including tool-use APIs for LLM agents, MCP (Model Context Protocol), or function-calling patterns
Familiarity with AI orchestration frameworks (LangChain, LlamaIndex, Semantic Kernel) and how they interact with external data tools
Experience with infrastructure-as-code (Terraform, Pulumi), container orchestration (Kubernetes, ECS), and CI/CD pipelines for data platform deployments
Background in MarTech/AdTech data domains: identity graphs, audience segmentation, campaign analytics, attribution modeling, or real-time bidding data
Contributions to open-source data tools or published thought leadership on semantic layers, data mesh, or AI-enabled data architectures

Staff Data Engineer

Key skills

About this role

Responsibilities:

Requirements: