Design, build, and maintain scalable data pipelines and infrastructure
Develop batch and streaming systems for ingesting and processing large-scale data
Own core components of the data platform (warehouse, lake, orchestration, tooling)
Implement data modeling, transformation, and versioning frameworks
Ensure data quality, observability, and reliability across pipelines
Optimize systems for performance, cost, and scalability
Build tooling to support reproducible datasets and experimentation
Partner with data scientists and engineers to productionize ML and analytics workflows
Contribute to secure and compliant data infrastructure in cloud environments
Requirements
3–7+ years of experience in data infrastructure, infrastructure, or platform engineering
Strong understanding of real-time and batch data processing systems, including stream processing, messaging architectures, and large-scale data workflows (Spark, Kafka, etc.)
Experience with modern data tooling (dbt, Airflow, etc.)
Strong understanding of data modeling, ETL/ELT, reverse ETL, and orchestration
Experience designing for scalability, reliability, and observability
Hands-on experience with cloud platforms (AWS/GCP) and storage systems
Ability to operate in a fast-moving startup environment with high ownership
Familiarity with ML infrastructure or feature stores is a nice-to-have
Experience with data versioning, lineage, or reproducibility systems is a nice-to-have
Background in security, compliance, or high-sensitivity data domains is a nice-to-have
Previous experience at an early-stage startup is a nice-to-have
Tech Stack
Airflow
AWS
Cloud
ETL
Google Cloud Platform
Kafka
Spark
Benefits
Competitive salary + equity
Health, dental, vision coverage
Flexible PTO
Opportunity to join at an early stage and shape the company’s foundation