Yahoo is a leading technology company known for its consumer inbox service, Yahoo Mail. The Senior Data Engineer will play a crucial role in defining data ontology, building data pipelines, and collaborating with cross-functional teams to enhance data operations and analytics capabilities.
Responsibilities:
- Partner with Data Science, Product, and Engineering to collect requirements to define the data ontology for Mail Data & Analytics
- Lead and mentor junior Data Engineers to support Yahoo Mail’s ever-evolving data needs
- Design, build, and maintain efficient and reliable batch data pipelines to populate core data sets
- Develop scalable frameworks and tooling to automate analytics workflows and streamline users interactions with data products
- Establish and promote standard methodologies for data operations and lifecycle management
- Develop new or improve and maintain existing large-scale data infrastructures and systems for data processing or serving, optimizing complex code through advanced algorithmic concepts and in-depth understanding of underlying data system stacks
- Create and contribute to frameworks that improve the efficacy of the management and deployment of data platforms and systems, while working with data infrastructure to triage and resolve issues
- Prototype new metrics or data systems
- Define and manage Service Level Agreements for all data sets in allocated areas of ownership
- Develop complex queries, very large volume data pipelines, and analytics applications to solve analytics and data engineering problems
- Collaborate with engineers, data scientists, and product managers to understand business problems, technical requirements to deliver data solutions
- Engineering consulting on large and complex data lakehouse data
Requirements:
- BS in Computer Science/Engineering, relevant technical field, or equivalent practical experience, with specialization in Data Engineering
- 6+ years of experience building scalable ETL pipelines on industry standard ETL orchestration tools (Airflow, Composer, Oozie) with deep expertise in SQL, PySpark, or scala
- Built, scaled, and maintained Multi-Terabyte data sets and having an expansive toolbox for debugging and unblocking large scale analytics challenges (skew mitigation, sampling strategies, accumulation patterns, data sketches, etc.)
- Experience with at least one major cloud's suite of offerings (AWS, GCP, Azure)
- Developed or enhanced ETL orchestrations tools or framework
- Worked within standard GitOps workflow (branch and merge, PRs, CI / CD systems)
- Experience working with GDPR
- Highly self-motivated with a strong sense of ownership
- Detail-oriented with a commitment to quality and accuracy
- Collaborative team player who contributes positively to group success
- Strong written and verbal communication skills
- Able to prioritize effectively, manage multiple tasks, and set clear expectations
- 3+ years experience in Google Cloud Platform technologies (BiqQuery, Dataproc, Dataflow, Composer, Looker)