Develop and maintain data pipelines and transformations across the stack. Starting from ingesting transactional data into the data lakehouse to refining data up the medallion data architecture.
Tune performance, storage layout, and cost-efficiency across our data storage and query engines.
Help design and implement new data ingestion patterns and improve platform observability and reliability.
Partner with engineering, product, and operations teams to deliver well-structured, trustworthy data for diverse use cases.
Help establish and evolve best practices for our data infrastructure, from pipeline design to OpenTofu-managed resource provisioning.
Help design and implement a data governance strategy to secure our data lakehouse.
Requirements
8-10+ years of experience building and maintaining data pipelines in production environments
Strong knowledge of the data lakehouse ecosystem, with an emphasis on AWS data services
particularly Glue, S3, Athena/Trino/PrestoDB, and Aurora
Proficiency in Python, Spark and Athena/Trino/PrestoDB for data transformation and orchestration
Experience managing infrastructure with OpenTofu/Terraform or other Infrastructure-as-Code tools
Solid understanding of data modeling, partitioning strategies, schema evolution, and performance tuning
Comfortable working with cloud-native data pipelines and batch processing (streaming experience is a plus but not required)