Yahoo is an industry leading direct to Consumer and Ad tech solution for advertisers and publishers. They are seeking a Senior Data Engineer to work on data engineering pipelines and next-generation Machine Learning- and AI-based data infrastructure, supporting new functionalities and mining data for analytics insights and product features.
Responsibilities:
- Improve our existing data infrastructures for machine learning and deep learning using your core expertise
- Design and build unified, production-grade streaming and batch data pipelines that achieve full event coverage with near-real-time latency
- Develop schema optimization and compression strategies for efficient large-scale data ingestion and storage
- Build the data foundation for ML training pipelines—including feature engineering, real-time feature serving, and batch feature computation—that powers yield optimization and predictive analytics
- Work with other engineers to implement algorithms and systems in an efficient way
- Take end-to-end ownership of Machine Learning-based distributed data systems—from data pipelines and training, to real-time prediction engines
- Develop complex queries, very large volume data pipelines, and analytics applications
- Develop complex queries and software programs to solve analytics and data mining problems
- Build data quality monitoring systems, automated anomaly detection, and reconciliation processes for production-grade revenue operations
- Interact with data analysts, data scientists, product managers, and software engineers to understand business problems and technical requirements to deliver data solutions
- Prototype new metrics or data systems
- Lead data investigations to troubleshoot data issues that arise along the data pipelines
- Maintenance and improvement of released systems
- Engineering consulting on large and complex warehouse data
Requirements:
- BS with 7+ years of relevant Industry experience/M.S. in Computer Science with 5+ years of relevant Industry experience. Computer Science graduate ideally with specialization in Data Engineering or Machine Learning
- Strong fundamentals: algorithms, distributed computing, data structure, database
- Fluency with at least one of: Go/Java/Python/C++/Scala/SQL
- 5+ years of industry experience on very large scale analytics or ML systems development
- 2+ years of experience with Google Cloud Platform (BigQuery, Dataproc, Composer, Dataflow, BigTable, etc.)
- 2+ years of experience in Hadoop technologies (Map/Reduce, Pig, Hive, HBase, Spark, Kafka, Oozie, etc.)
- Experience in data modeling, schema design, ETL, and data analysis
- Self-driven, challenge-loving, detail oriented, teamwork spirit, excellent communication skills, ability to multitask and manage expectations
- Experience with machine learning algorithms, NLP, and/or statistical methods a big plus
- Experience in any of: machine learning, analytics, data mining, or data mart and warehouse
- Experience with Deep Learning platforms (Tensorflow/Keras/Spark MLlib)
- Experience in ad tech, programmatic advertising, or publisher-side monetization platforms
- Experience building data quality frameworks, automated reconciliation systems, and observability for data pipelines (OpenTelemetry)
- Experience with privacy-enhancing technologies, data clean rooms, or identity resolution systems