Own the architecture and operations of our data lakehouse, including object storage, table formats, maintenance, and query engine integrations
Build and maintain the infrastructure layer that transforms and serves data reliably at scale—from raw landing zones through to curated, queryable datasets
Partner with product engineering to establish data contracts and schema standards around event telemetry, ensuring data arrives in the lakehouse in a form that's reliable and ready for downstream use
Drive decisions on data platform architecture, tooling, and engineering best practices across storage, compute, and access layers
Enhance observability and monitoring of data infrastructure, including pipeline reliability, data freshness, and system performance
Partner cross-functionally with teams across Analytics, Infrastructure, and Product to understand data needs and deliver impactful platform solutions
Provide product feedback by dogfooding new data infrastructure and AI technology
Requirements
Expert-level SQL and Python skills
5+ years of experience as a data engineer, and 8+ years of total experience in software engineering (including data engineering roles)
Strong knowledge of data lakehouse architecture, including storage layer design, table formats, and compute/query engine integration
Experience defining and enforcing data contracts or schema standards in collaboration with upstream engineering teams
Hands-on experience with modern orchestration tools like Airflow, Dagster, or Prefect
Working knowledge of cloud infrastructure tooling, including Terraform, Helm, and Kubernetes
Hands-on experience running Apache Spark in production, including job tuning, cluster sizing, and managing failures at scale
A bias for action—able to stay focused and prioritize effectively in an ambiguous environment
Tech Stack
Airflow
Apache
Cloud
Kubernetes
Python
Spark
SQL
Terraform
Benefits
Unlimited vacation time with a culture that actively encourages time off