Zillow is a leading real estate platform in the U.S., and they are seeking a Senior Software Development Engineer for their Agentic AI Data Services team. The role involves building scalable data products and data platforms that support analytics and AI workflows, with a focus on improving data quality and system reliability.
Responsibilities:
- Build and operate scalable data platforms and shared data products that power Zillow’s next generation of AI experiences, especially Voyager / Zillow AI mode
- Own and evolve trace and event data foundations that support analytics, online and offline evaluation, observability, governance, and future learning workflows
- Design and operate production-grade batch, streaming, and near-real-time data pipelines using Databricks, Spark, Python, SQL, and modern lakehouse patterns
- Improve platform reliability through schema management, data quality validation, alerting, anomaly detection, and production support for business-critical AI data products
- Partner with engineering, analytics, and science teams to turn evolving AI data needs into reusable infrastructure and governed data products
- Help define the long-term architecture for AI data systems, including data contracts, eventing patterns, retention approaches, and cleaner downstream interfaces
Requirements:
- 5+ years of experience in big data engineering, data platform engineering, machine learning engineering, or a closely related field
- Strong proficiency in Python and SQL
- Experience building production-grade data systems on distributed platforms such as Databricks and Spark
- Designed and operated scalable batch, streaming, or near-real-time pipelines and datasets for analytics, observability, and AI or ML use cases
- Experience supporting ML, LLM, or agentic AI systems in production through high-quality data products, governance, and operational rigor
- Comfortable working with complex, high-volume event, telemetry, trace, or log-style datasets
- Experience with modern architecture patterns such as streaming, incremental processing, data contracts, or lakehouse design
- Improved production reliability through schema evolution, ingestion safeguards, monitoring, alerting, root-cause analysis, and operational ownership
- Ability to communicate clearly and collaborate effectively across engineering, analytics, science, evaluation, and platform teams in fast-moving environments
- Experience with technologies and practices such as Kafka, Spark Structured Streaming, MLflow, OpenTelemetry, CI/CD, and Git-based workflows