Yahoo serves as a trusted guide for hundreds of millions of people globally, helping them achieve their goals online through their portfolio of iconic products. The Senior Software Engineer will design, build, and optimize scalable data pipelines and infrastructure to power advanced analytics and machine learning solutions, collaborating closely with software engineers and business stakeholders.

Responsibilities:

Apply software engineering expertise to build high-performance, scalable data warehouses
Be excited to learn and take ownership for large-scale projects spanning many tech stacks and environments
Design, build, and launch efficient & reliable data pipelines to move and transform data on the scale of multiple petabyte(s) using the latest technologies
Build real time analytics and ingestion pipelines capable of processing more than a million events per second and provide insights at sub-second latency
Interact with product owners and end users to understand and solve new business requirements as they emerge
Design and audit processes for ensuring the delivery of high-quality data through rigorous QA checks
Have excellent data modeling skills to understand the nuances of various dimension and metric types in warehouses
Design workflows to ingest, load and present new data sets for users
Provide active support, be on rotation for on-call support on production pipelines (typically a couple of times each quarter)
Define and manage SLA for all data sets in allocated areas of ownership
Work with the production engineering / infrastructure team to drive resolution to production issues

Requirements:

BS/MS in Computer Science and/or Mathematics/Statistics
4+ years experience in relevant software development with at least 2 years of professional Java and/or Python experience
2+ years experience in the Big Data pipeline and analytics space with experience across technology stacks
2+ years experience in custom ETL design using Big Data stack environments (Hadoop, MapReduce, Pig, Hive, AWS EMR, Apache Beam, Google Cloud Platform Dataflow, BigQuery), implementation and maintenance
Experience or familiarity with some of the following tools: Kafka, Storm, Streaming (Spark, Dataflow), ElasticSearch
Design, build, and maintain scalable data pipelines and ETL processes to support machine learning and AI initiatives on Google Cloud Platform (GCP)
Implement and optimize data storage solutions using GCP services such as BigQuery, Cloud Storage, and Dataflow
Ensure data quality, integrity, and security throughout the data lifecycle
Collaborate with data scientists, analysts, and business stakeholders to understand data requirements and deliver actionable insights
Monitor, troubleshoot, and maintain the health and performance of cloud-based data infrastructure
Automate manual processes and repetitive tasks to improve efficiency and reduce errors
Apply data governance and compliance best practices to protect sensitive information and meet regulatory standards
Stay current with new GCP features, tools, and best practices to continuously enhance data management capabilities
Document solutions, processes, and architectural decisions to facilitate knowledge sharing and maintainability
Experience working with either MapReduce or any other Parallel data processing system
Experience with schema design and dimensional data modeling
Comfortable writing complex SQL queries
Strong data mindset with a deep appreciation for analyzing data to identify product gaps and enhancements to improve user engagement and revenue growth
Excellent communication skills and ability to tell insightful stories using data and also manage communication within internal teams and stakeholders

Senior Software Engineer

Key skills

About this role

Responsibilities:

Requirements: