Yahoo is a leading technology company that connects brands and partners with a vast audience. They are seeking a Software Development Engineer II to design, build, and maintain scalable data pipelines and backend services, ensuring the performance and reliability of data solutions for critical business operations.
Responsibilities:
- Design, develop, and maintain automated, cloud-native ETL/ELT pipelines using GCP services (e.g., Dataflow, Dataproc, Cloud Composer, Pub/Sub, BigQuery)
- Ingest and process structured and unstructured data from diverse sources into BigQuery and other GCP data systems
- Transform, clean, and enrich datasets to support analytics, machine learning, and reporting use cases
- Implement data validation, monitoring, and alerting for pipeline reliability and data quality
- Collaborate with cross-functional teams using Git, Jira, and CI/CD pipelines
- Troubleshoot and resolve production data and backend issues, ensuring system reliability and scalability
- Document system design, data flows, and operational processes for transparency and maintainability
- Mentor junior engineers and contribute to engineering best practices for cloud-based data systems
Requirements:
- B.S. or M.S. in Computer Science, Engineering, or related field, or equivalent practical experience
- 3+ years of experience in backend or data engineering
- Proficiency in at least one backend language — Python, Java, or Go preferred
- Hands-on experience with GCP data services: BigQuery, Dataflow, Dataproc, Cloud Storage, Pub/Sub, and Composer (Airflow)
- Solid understanding of relational and distributed databases, data modeling, and performance optimization
- Familiarity with containerization and orchestration (Docker, Kubernetes/GKE)
- Strong Unix/Linux and shell scripting skills
- Experience building and managing workflow orchestration (Airflow or Cloud Composer)
- Knowledge of CI/CD, logging, and monitoring (Stackdriver, Cloud Logging, Prometheus, etc.)
- Strong problem-solving, debugging, and communication skills with a focus on reliability and scalability
- Exposure to AI/ML workflows or tools such as Vertex AI
- Experience with streaming data processing (Dataflow, Kafka, or Pub/Sub)
- Understanding of concurrency, multithreading, and distributed systems design
- Familiarity with infrastructure-as-code (Terraform, Deployment Manager)
- Familiarity with version control tools (Git)