Yahoo is an industry leading direct to Consumer and Ad tech solution for advertisers and publishers. They are seeking a Senior Data Engineer to work on data engineering pipelines and next-generation Machine Learning and AI-based data infrastructure, supporting functionalities on existing platforms and mining data for analytics insights.

Responsibilities:

Improve our existing data infrastructures for machine learning and deep learning using your core expertise
Design and build unified, production-grade streaming and batch data pipelines that achieve full event coverage with near-real-time latency
Develop schema optimization and compression strategies for efficient large-scale data ingestion and storage
Build the data foundation for ML training pipelines—including feature engineering, real-time feature serving, and batch feature computation—that powers yield optimization and predictive analytics
Work with other engineers to implement algorithms and systems in an efficient way
Take end-to-end ownership of Machine Learning-based distributed data systems—from data pipelines and training, to real-time prediction engines
Develop complex queries, very large volume data pipelines, and analytics applications
Develop complex queries and software programs to solve analytics and data mining problems
Build data quality monitoring systems, automated anomaly detection, and reconciliation processes for production-grade revenue operations
Interact with data analysts, data scientists, product managers, and software engineers to understand business problems and technical requirements to deliver data solutions
Prototype new metrics or data systems
Lead data investigations to troubleshoot data issues that arise along the data pipelines
Maintenance and improvement of released systems
Engineering consulting on large and complex warehouse data

Requirements:

BS with 7+ years of relevant Industry experience/M.S. in Computer Science with 5+ years of relevant Industry experience
Computer Science graduate ideally with specialization in Data Engineering or Machine Learning
Strong fundamentals: algorithms, distributed computing, data structure, database
Fluency with at least one of: Go/Java/Python/C++/Scala/SQL
5+ years of industry experience on very large scale analytics or ML systems development
2+ years of experience with Google Cloud Platform (BigQuery, Dataproc, Composer, Dataflow, BigTable, etc.)
2+ years of experience in Hadoop technologies (Map/Reduce, Pig, Hive, HBase, Spark, Kafka, Oozie, etc.)
Experience in data modeling, schema design, ETL, and data analysis
Self-driven, challenge-loving, detail oriented, teamwork spirit, excellent communication skills, ability to multitask and manage expectations
Experience with machine learning algorithms, NLP, and/or statistical methods a big plus
Experience in any of: machine learning, analytics, data mining, or data mart and warehouse
Experience with Deep Learning platforms (Tensorflow/Keras/Spark MLlib)
Experience in ad tech, programmatic advertising, or publisher-side monetization platforms
Experience building data quality frameworks, automated reconciliation systems, and observability for data pipelines (OpenTelemetry)
Experience with privacy-enhancing technologies, data clean rooms, or identity resolution systems

Senior Data Engineer - Ads Measurement

Key skills

About this role

Responsibilities:

Requirements: