Lead Data Engineer
Job Summary
We are seeking an experienced and highly skilled Lead Data Engineer to drive the design, development, and delivery of enterprise-scale data platforms and modern data engineering solutions. The ideal candidate will possess deep expertise in Apache Spark, Scala, Databricks, AWS, and modern Lakehouse architectures, with a strong focus on building scalable, resilient, and high-performance data ecosystems.
This role requires hands-on technical leadership in developing metadata-driven data platforms, implementing Databricks Medallion Architecture, optimizing distributed data processing workloads, and establishing robust governance frameworks using Unity Catalog. The candidate should demonstrate strong expertise in Spark/Scala development, distributed systems troubleshooting, and enterprise data integration, while leading cross-functional teams to deliver business-critical data solutions.
Key Responsibilities
- Lead end-to-end delivery of large-scale data engineering initiatives, ensuring high-quality and timely outcomes.
- Architect, design, and implement scalable, cloud-native data platforms using Databricks, Apache Spark (Scala), and AWS services.
- Develop and maintain Bronze, Silver, and Gold layer implementations following Databricks Medallion Architecture best practices.
- Design and build high-performance batch and streaming data pipelines with strong emphasis on scalability, reliability, and maintainability.
- Drive Spark performance tuning and optimization, including query optimization, partitioning strategies, memory management, and cluster utilization.
- Develop and enhance metadata-driven ingestion and processing frameworks to accelerate data onboarding and improve operational efficiency.
- Lead implementation of Change Data Capture (CDC) solutions while addressing complex CDC edge cases, late-arriving data, and data reconciliation challenges.
- Design robust ingestion frameworks to handle sparse column structures, schema evolution, and dynamic data patterns.
- Establish and enforce data governance, security, and compliance standards through Unity Catalog and enterprise data management practices.
- Design and implement secure cross-account AWS integrations for data sharing, access management, and multi-account architectures.
- Diagnose and resolve complex distributed system failures, data processing bottlenecks, and platform reliability issues.
- Lead Workday HCM integration initiatives, including data extraction, transformation, and downstream system integration.
- Oversee ETL modernization and migration programs, particularly transitioning from traditional ETL platforms such as Informatica PowerCenter to cloud-native data platforms.
- Collaborate with data scientists, analysts, architects, and business stakeholders to translate business requirements into scalable technical solutions.
- Mentor and provide technical leadership to engineering teams, promoting best practices in data engineering and software development.
- Manage project timelines, risks, dependencies, resource planning, and stakeholder communications.
Required Skills & Qualifications
- 12+ years of overall IT experience with at least 5+ years of hands-on experience in Big Data and Data Engineering.
- Strong expertise in Apache Spark and Scala programming, with proven experience developing enterprise-scale data processing applications.
- Deep understanding of Spark internals, Spark SQL, DataFrames, RDDs, Catalyst Optimizer, and performance tuning techniques.
- Extensive hands-on experience with Databricks Lakehouse Platform and implementation of Medallion Architecture.
- Strong knowledge of distributed computing concepts and troubleshooting complex distributed system failures.
- Experience designing and implementing metadata-driven frameworks and reusable data engineering components.
- Expertise in handling CDC implementations, CDC edge cases, data reconciliation, and incremental processing patterns.
- Strong understanding of sparse column ingestion challenges, schema evolution, and semi-structured data processing.
- Hands-on experience with AWS services such as S3, IAM, Glue, Lambda, EMR, Redshift, Secrets Manager, and cross-account integration patterns.
- Experience implementing and managing Unity Catalog for data governance, security, lineage, and access control.
- Strong understanding of Data Engineering fundamentals including data modeling, data quality, orchestration, pipeline design, and data lifecycle management.
- Experience with Workday HCM integrations and enterprise HR data ecosystems.
- Advanced SQL skills and expertise in relational and dimensional data modeling.
- Experience with ETL platforms, particularly Informatica PowerCenter, and migration to modern cloud-native architectures.
- Proven experience leading technical teams and delivering complex enterprise data programs.
- Strong analytical, problem-solving, communication, and stakeholder management skills.