
Job Title: Data Engineer (Ab Initio Databricks Modernization)
Location: Wilmington, DE 5 Days onsite role
Overview
We are seeking a Data Engineer to lead the modernization of legacy ETL systems by migrating Ab Initio workflows to scalable, modular PySpark pipelines on Databricks. The role involves transforming complex data ecosystems into cloud-native architectures while ensuring data integrity, performance, and reliability.
Key Responsibilities
ETL Modernization & Development
Analyze and migrate legacy ETL workflows from Ab Initio to PySpark-based pipelines
Design and develop scalable data pipelines on Databricks
Refactor monolithic processes into modular, reusable components
Leverage existing enterprise datasets to avoid redundancy
Data Integration & Processing
Build and maintain ETL/ELT pipelines integrating data from Snowflake and other sources
Process and publish enriched datasets for downstream applications
Support batch and near real-time data processing
Data Lineage & Optimization
Create end-to-end data lineage and data flow diagrams
Identify redundancies and drive process consolidation and optimization
Ensure adherence to data governance and quality standards
Testing & Validation
Develop unit, integration, and reconciliation frameworks
Perform dual-run comparisons with legacy systems
Validate outputs in UAT and pre-production environments
Deployment & Operations
Support cutover and migration strategy from legacy systems
Decommission legacy workflows and optimize scheduling (e.g., Control-M)
Develop runbooks, monitoring, and operational documentation
Collaboration
Work with data architects, analysts, and downstream application teams
Coordinate user acceptance testing (UAT/FAT) and stakeholder sign-offs