Lead Data Engineer

Job Summary

We are seeking an experienced and highly skilled Lead Data Engineer to drive the design, development, and delivery of enterprise-scale data platforms and modern data engineering solutions. The ideal candidate will possess deep expertise in Apache Spark, Scala, Databricks, AWS, and modern Lakehouse architectures, with a strong focus on building scalable, resilient, and high-performance data ecosystems.

This role requires hands-on technical leadership in developing metadata-driven data platforms, implementing Databricks Medallion Architecture, optimizing distributed data processing workloads, and establishing robust governance frameworks using Unity Catalog. The candidate should demonstrate strong expertise in Spark/Scala development, distributed systems troubleshooting, and enterprise data integration, while leading cross-functional teams to deliver business-critical data solutions.

Key Responsibilities

Lead end-to-end delivery of large-scale data engineering initiatives, ensuring high-quality and timely outcomes.
Architect, design, and implement scalable, cloud-native data platforms using Databricks, Apache Spark (Scala), and AWS services.
Develop and maintain Bronze, Silver, and Gold layer implementations following Databricks Medallion Architecture best practices.
Design and build high-performance batch and streaming data pipelines with strong emphasis on scalability, reliability, and maintainability.
Drive Spark performance tuning and optimization, including query optimization, partitioning strategies, memory management, and cluster utilization.
Develop and enhance metadata-driven ingestion and processing frameworks to accelerate data onboarding and improve operational efficiency.
Lead implementation of Change Data Capture (CDC) solutions while addressing complex CDC edge cases, late-arriving data, and data reconciliation challenges.
Design robust ingestion frameworks to handle sparse column structures, schema evolution, and dynamic data patterns.
Establish and enforce data governance, security, and compliance standards through Unity Catalog and enterprise data management practices.
Design and implement secure cross-account AWS integrations for data sharing, access management, and multi-account architectures.
Diagnose and resolve complex distributed system failures, data processing bottlenecks, and platform reliability issues.
Lead Workday HCM integration initiatives, including data extraction, transformation, and downstream system integration.
Oversee ETL modernization and migration programs, particularly transitioning from traditional ETL platforms such as Informatica PowerCenter to cloud-native data platforms.
Collaborate with data scientists, analysts, architects, and business stakeholders to translate business requirements into scalable technical solutions.
Mentor and provide technical leadership to engineering teams, promoting best practices in data engineering and software development.
Manage project timelines, risks, dependencies, resource planning, and stakeholder communications.

Required Skills & Qualifications

12+ years of overall IT experience with at least 5+ years of hands-on experience in Big Data and Data Engineering.
Strong expertise in Apache Spark and Scala programming, with proven experience developing enterprise-scale data processing applications.
Deep understanding of Spark internals, Spark SQL, DataFrames, RDDs, Catalyst Optimizer, and performance tuning techniques.
Extensive hands-on experience with Databricks Lakehouse Platform and implementation of Medallion Architecture.
Strong knowledge of distributed computing concepts and troubleshooting complex distributed system failures.
Experience designing and implementing metadata-driven frameworks and reusable data engineering components.
Expertise in handling CDC implementations, CDC edge cases, data reconciliation, and incremental processing patterns.
Strong understanding of sparse column ingestion challenges, schema evolution, and semi-structured data processing.
Hands-on experience with AWS services such as S3, IAM, Glue, Lambda, EMR, Redshift, Secrets Manager, and cross-account integration patterns.
Experience implementing and managing Unity Catalog for data governance, security, lineage, and access control.
Strong understanding of Data Engineering fundamentals including data modeling, data quality, orchestration, pipeline design, and data lifecycle management.
Experience with Workday HCM integrations and enterprise HR data ecosystems.
Advanced SQL skills and expertise in relational and dimensional data modeling.
Experience with ETL platforms, particularly Informatica PowerCenter, and migration to modern cloud-native architectures.
Proven experience leading technical teams and delivering complex enterprise data programs.
Strong analytical, problem-solving, communication, and stakeholder management skills.

Lead Data Engineer

Key skills

About this role