Yahoo is a leading technology company known for its consumer inbox service, Yahoo Mail. The Senior Data Engineer will play a crucial role in defining data ontology, building data pipelines, and collaborating with cross-functional teams to enhance data operations and analytics capabilities.

Responsibilities:

Partner with Data Science, Product, and Engineering to collect requirements to define the data ontology for Mail Data & Analytics
Lead and mentor junior Data Engineers to support Yahoo Mail’s ever-evolving data needs
Design, build, and maintain efficient and reliable batch data pipelines to populate core data sets
Develop scalable frameworks and tooling to automate analytics workflows and streamline users interactions with data products
Establish and promote standard methodologies for data operations and lifecycle management
Develop new or improve and maintain existing large-scale data infrastructures and systems for data processing or serving, optimizing complex code through advanced algorithmic concepts and in-depth understanding of underlying data system stacks
Create and contribute to frameworks that improve the efficacy of the management and deployment of data platforms and systems, while working with data infrastructure to triage and resolve issues
Prototype new metrics or data systems
Define and manage Service Level Agreements for all data sets in allocated areas of ownership
Develop complex queries, very large volume data pipelines, and analytics applications to solve analytics and data engineering problems
Collaborate with engineers, data scientists, and product managers to understand business problems, technical requirements to deliver data solutions
Engineering consulting on large and complex data lakehouse data

Requirements:

BS in Computer Science/Engineering, relevant technical field, or equivalent practical experience, with specialization in Data Engineering
6+ years of experience building scalable ETL pipelines on industry standard ETL orchestration tools (Airflow, Composer, Oozie) with deep expertise in SQL, PySpark, or scala
Built, scaled, and maintained Multi-Terabyte data sets and having an expansive toolbox for debugging and unblocking large scale analytics challenges (skew mitigation, sampling strategies, accumulation patterns, data sketches, etc.)
Experience with at least one major cloud's suite of offerings (AWS, GCP, Azure)
Developed or enhanced ETL orchestrations tools or framework
Worked within standard GitOps workflow (branch and merge, PRs, CI / CD systems)
Experience working with GDPR
Highly self-motivated with a strong sense of ownership
Detail-oriented with a commitment to quality and accuracy
Collaborative team player who contributes positively to group success
Strong written and verbal communication skills
Able to prioritize effectively, manage multiple tasks, and set clear expectations
3+ years experience in Google Cloud Platform technologies (BiqQuery, Dataproc, Dataflow, Composer, Looker)

Senior Data Engineer

Key skills

About this role

Responsibilities:

Requirements: