Vibe is a company that reimagines how brands reach audiences in the age of streaming, providing an Audience First Streaming TV Advertising solution. The Director of Data Engineering will own the infrastructure that ingests and governs terabytes of data daily, while also leading the development of the ML platform and data architecture to support analytics and real-time bidding systems.
Responsibilities:
- Own the technical design of Nebula — Vibe's identity graph, entity resolution pipelines, and data clean room integrations with partners including major broadcasters
- Define the end-to-end data architecture serving Analytics, ML, and real-time bidding systems
- Solve training vs. inference skew: ensure data used to train models matches data available at bid time
- Design and ship the ML platform — feature stores, model registries, and CI/CD for ML — so data scientists can deploy models to production without infrastructure blockers
- Own the "golden path": a data scientist pushes code, a model retrains and deploys automatically
- Bridge the gap between Data Engineering and Data Science; remove friction, not just document it
- Champion the transition to a security-first culture — SOC2, RBAC, PII anonymization — without turning compliance into a bottleneck
- Build guardrails, not gatekeepers: automated policy checks that let engineers ship fast and safely
- Own data retention policies, access controls, and governance frameworks across 200+ data assets
- Hold the Data Platform P&L — track unit economics, separate storage costs from ML training costs, and ensure spend scales with revenue rather than ahead of it
- Optimise across a hybrid stack: high-volume streaming (Kafka/Kinesis), log storage (S3/Athena), and GPU compute for ML training
- Identify waste fast; distinguish inefficiency from intentional growth investment
- Manage and grow the Data Platform Engineering team
- Assess current team capabilities against what's needed to ship Nebula and the ML platform
- Build a culture where engineers adopt structure because it makes them faster, not because they're told to
Requirements:
- Hands-on experience architecting large-scale data platforms — you've designed systems ingesting TBs of data, not just managed teams that did
- Deep knowledge of data governance and compliance frameworks — CCPA, GDPR, SOC2 — and a track record of implementing them without killing engineering velocity
- Experience with identity resolution, device graphs, or privacy-safe data matching (clean rooms, entity resolution)
- Strong understanding of the ML lifecycle: data prep, training, deployment, monitoring — and the infrastructure that makes it work at scale
- Experience owning cloud infrastructure costs and optimising unit economics across AWS or GCP
- Prior experience in a regulated data environment — you understand publisher contracts, DPAs, and what data you can and cannot use
- Hands-on experience with clean room technologies (Snowflake Data Clean Rooms, AWS Clean Rooms, LiveRamp Safe Haven, or similar)
- Familiarity with MLOps tooling — feature stores (Feast, Tecton), model serving (SageMaker, Ray Serve), orchestration (Airflow, Dagster)
- Background in streaming TV, connected TV, or programmatic advertising infrastructure
- Experience leading a technical team through a SOC2 certification process