Own the technical strategy and execution of migrating large-scale data workloads from GCP to AWS, ensuring continuity, data integrity, and minimal disruption.
Design migration playbooks and serve as the go-to expert for decisions across compute, storage, and orchestration layers during the transition.
Architect and implement scalable batch and streaming data pipelines using Apache Spark, Delta Lake, and the medallion architecture.
Establish standards for pipeline design, data quality, and observability that the broader engineering organization can build on.
Take accountability for the reliability, performance, and cost-efficiency of production ETL jobs running on AWS (EMR, Glue) against terabyte-scale datasets.
Proactively identify and address bottlenecks, technical debt, and opportunities to improve throughput and resilience.
Requirements
Strong, hands-on Scala expertise with solid Python proficiency
Deep experience with Apache Spark for both streaming and batch data processing at scale
Proven track record running production ETL workloads on AWS (EMR, Glue) against terabytes of data
Experience designing and operating data architectures using Delta Lake and the medallion (Bronze / Silver / Gold) pattern
8+ years of data engineering experience, with a track record of owning critical infrastructure end-to-end.