Zeta Global is an AI-Powered Marketing Cloud that simplifies sophisticated marketing through its Zeta Marketing Platform. They are seeking a Senior Data Engineer to design and operate data pipelines and aggregates for their AdTech platform, focusing on high-scale data processing and analytics-ready datasets.
Responsibilities:
- Build data pipelines: Develop robust batch and streaming pipelines (Kafka/Kinesis) to ingest, transform, and enrich large-scale event data (impressions, clicks, conversions, costs, identity signals)
- Create data aggregates & marts: Design and maintain curated aggregates and dimensional models for multiple consumers—prediction models, agents, BI dashboards, and measurement workflows
- Data modeling & contracts: Define schemas, data contracts, and versioning strategies to keep downstream systems stable as sources evolve
- Data quality & reliability: Implement validation, anomaly detection, backfills, and reconciliation to ensure completeness, correctness, and timeliness (SLAs/SLOs)
- Performance & cost optimization: Optimize compute/storage for scale (partitioning, file sizing, incremental processing, indexing), balancing latency, throughput, and cost
- Orchestration & automation: Build repeatable workflows with scheduling/orchestration (e.g., Airflow, Dagster, Step Functions) and CI/CD for data pipelines
- Observability for data systems: Instrument pipelines with metrics, logs, lineage, and alerting to accelerate detection and root-cause analysis of data issues
- Security & governance: Apply least-privilege access, PII-aware handling, and governance controls aligned with enterprise standards
Requirements:
- 5+ years building production data pipelines and data products (batch and/or streaming) in a high-scale environment
- Strong experience with SQL and data modeling (dimensional modeling, star/snowflake schemas, event modeling)
- Hands-on experience with streaming systems (Kafka preferred) and/or AWS Kinesis, including event-driven designs
- Proficiency in one or more languages used for data engineering (Python, Java, Scala, or Go)
- Experience with distributed data processing (Spark, Flink, or equivalent) and performance tuning at scale
- Experience with AWS data services and cloud-native patterns (S3, Glue/EMR, Athena, Redshift, etc. as applicable)
- Familiarity with lakehouse/table formats and large-scale storage patterns (e.g., Parquet; Iceberg/Hudi/Delta are a plus)
- Experience with orchestration/workflow tooling (Airflow/Dagster/Step Functions) and CI/CD for data workloads
- Strong data quality/observability practices (tests, monitoring, lineage; understanding of SLAs/SLOs)
- Experience with SQL + NoSQL data stores (e.g., Postgres/MySQL; DynamoDB/Cassandra/Redis) and choosing the right store per use case
- Clear communicator and collaborator; able to work with mixed audiences and translate needs into reliable data interfaces
- AdTech / programmatic advertising domain knowledge: DSP/SSP/exchange/RTB concepts and data flows
- Experience building measurement pipelines (attribution, incrementality, lift, or experimentation analytics)
- Experience supporting ML feature stores, offline/online feature generation, or model training datasets
- Experience with real-time analytics stores (Druid/ClickHouse/Pinot) and high-cardinality aggregation strategies
- Deep knowledge of data governance/privacy, including PII handling and consent-aware data processing
- Open-source contributions, publications, or conference speaking
- BS/MS in CS/Engineering or equivalent practical experience