Build and extend batch pipelines using dbt for transformations and Dagster for orchestration, scheduling, and asset-driven lineage.
Develop and optimize BigQuery data models (dimensional, wide-table, or domain-oriented) to support analytics, experimentation, and reporting use cases.
Advance real-time streaming capabilities by implementing and maintaining Kafka/PubSub + Flink pipelines, primarily using FlinkSQL, to deliver low-latency datasets and event-derived metrics.
Design data platform standards: SDLC, naming conventions, modeling patterns, incremental strategies, schema evolution approaches, and best practices for batch + streaming including CI/CD and testing.
Improve reliability and observability by implementing monitoring, alerting, and SLAs/SLOs for pipelines and data quality.
Partner with analytics, product, and engineering teams to onboard new data sources, define contracts, and deliver trusted datasets.
Own platform operations including performance tuning, data quality, cost optimization, and scaling across both warehouse and streaming systems.
Design a unified serving layer architecture that cleanly exposes consistent, trusted datasets across both batch and streaming systems.
Establish strong data governance, reliability standards, and observability practices.
Requirements
Strong proficiency in SQL (advanced querying, performance considerations, data modeling).
Proficiency in Python for data engineering tasks (pipeline glue code, libraries, tooling, testing).
Deep familiarity with BigQuery or equivalent cloud native data warehouse tooling (partitioning/clustering, cost/performance optimization, best practices).