Own the technical design of Nebula — Vibe's identity graph, entity resolution pipelines, and data clean room integrations with partners including major broadcasters.
Define the end-to-end data architecture serving Analytics, ML, and real-time bidding systems.
Solve training vs. inference skew: ensure data used to train models matches data available at bid time.
Design and ship the ML platform — feature stores, model registries, and CI/CD for ML — so data scientists can deploy models to production without infrastructure blockers.
Own the "golden path": a data scientist pushes code, a model retrains and deploys automatically.
Bridge the gap between Data Engineering and Data Science; remove friction, not just document it.
Champion the transition to a security-first culture — SOC2, RBAC, PII anonymization — without turning compliance into a bottleneck.
Build guardrails, not gatekeepers: automated policy checks that let engineers ship fast and safely.
Own data retention policies, access controls, and governance frameworks across 200+ data assets.
Hold the Data Platform P&L — track unit economics, separate storage costs from ML training costs, and ensure spend scales with revenue rather than ahead of it.
Optimise across a hybrid stack: high-volume streaming (Kafka/Kinesis), log storage (S3/Athena), and GPU compute for ML training.
Identify waste fast; distinguish inefficiency from intentional growth investment.
Manage and grow the Data Platform Engineering team.
Assess current team capabilities against what's needed to ship Nebula and the ML platform.
Build a culture where engineers adopt structure because it makes them faster, not because they're told to.
Requirements
Hands-on experience architecting large-scale data platforms — you've designed systems ingesting TBs of data, not just managed teams that did.
Deep knowledge of data governance and compliance frameworks — CCPA, GDPR, SOC2 — and a track record of implementing them without killing engineering velocity.
Experience with identity resolution, device graphs, or privacy-safe data matching (clean rooms, entity resolution).
Strong understanding of the ML lifecycle: data prep, training, deployment, monitoring — and the infrastructure that makes it work at scale.
Experience owning cloud infrastructure costs and optimising unit economics across AWS or GCP.
Prior experience in a regulated data environment — you understand publisher contracts, DPAs, and what data you can and cannot use.
Tech Stack
AWS
Cloud
Google Cloud Platform
Kafka
Benefits
Equity — Employee Stock Ownership Plan. You're building this; you should own part of it.
Variable pay — based on objectives you hit. No arbitrary targets.
Hybrid flexibility — We're in the heart of Paris and our team is in 3x a week.
Health insurance — Full coverage via Alan.
Meal vouchers — Via Swile.
Annual offsite — The whole team, once a year, somewhere worth the trip.
Tech Syncs — Engineering and Product meet in person at least quarterly, worldwide.