Yahoo is a leading company that guides millions globally through a portfolio of products. They are seeking a Software Data Engineer II to build innovative data solutions and enhance their cloud-first data ecosystem, focusing on designing robust data pipelines and collaborating with analytics teams.

Responsibilities:

Design and develop robust, scalable, and resilient data pipelines using both streaming and batch processing technologies
Collaborate with analytics and reporting teams to understand their needs and deliver well-structured data models
Architect and implement a variety of data solutions, by applying concepts such as Lambda, Kappa, and streaming architectures, ETL/ELT
Propose, prototype, and implement new technologies and best practices to improve our data ecosystem's efficiency and reliability
Contribute to the full data lifecycle, from data ingestion and transformation to modeling and warehousing
Participate in company Data Working Groups to architect data strategy for search and help define and enforce best practices for data quality, governance, and architecture

Requirements:

B.S. or M.S. in Computer Science (or equivalent experience)
3+ years of related industry experience
Strong background in data engineering, with a proven track record of designing and building scalable data platforms
Proactive self-starter who can take the lead on projects, sell new ideas, and drive them to completion with a high degree of autonomy
Deep understanding of data modeling principles and modern data architecture concepts
Excited by the challenge of working with diverse datasets in both batch and real-time streaming modes
Strong collaborator with excellent communication skills, capable of working with both technical and non-technical stakeholders
Fundamental belief in leveraging AI and machine learning as a core component of every data solution, from design to implementation
Deep expertise in GCP and/or AWS data services
Experience with modern cloud warehouses like BigQuery, Snowflake, Redshift, Databricks, or Dremio
Proficiency with Google Cloud Storage (GCS) and Amazon S3, Relational, Document and Wide Column Databases, columnar formats (parquet, orc, etc), schemas (avro, protocol buffers, etc), compression formats (gzip, snappy, bzip, etc.)
Hands-on experience with distributed data processing engines such as Apache Flink, Apache Beam, Spark, or similar technologies
Expertise in technologies like Cloud Functions, Cloud Run, Kubernetes, Dataflow, Dataproc, DataPleax, Glue, EMR, and orchestration tools like dbt or BigQuery Dataforms and Airflow / Cloud Workflows / Step Functions
Exposure and practical experience with streaming platforms like Pub/Sub and Kafka

Software Data Engineer II

Key skills

About this role

Responsibilities:

Requirements: