Lumenalta partners with organizations to build scalable technology solutions, and they are seeking an experienced Data Validation Engineer. This role focuses on ensuring data reliability and pipeline integrity across a modern lakehouse platform by designing reconciliation processes, maintaining automated data quality test suites, and developing regression testing strategies.
Responsibilities:
- Design and execute data reconciliation processes to validate consistency between source systems, migration outputs, and target datasets
- Build and maintain automated data quality test suites using Great Expectations, Deequ, or similar frameworks integrated directly into pipeline workflows
- Develop regression testing strategies for data pipelines, ensuring that platform changes and migrations do not introduce data quality issues
- Define and run performance benchmarking tests to validate pipeline throughput, query latency, and compute resource utilization
- Partner with Data and ETL Engineers to embed validation checkpoints throughout ingestion and transformation workflows
- Build dashboards and alerting mechanisms to surface data quality issues in real time and track quality metrics over time
- Document test plans, validation rules, and quality standards to support auditability, team onboarding, and stakeholder reporting
Requirements:
- 3–5+ years in data quality, QA engineering, or data engineering roles with a focus on validation, testing, and data integrity
- Proven experience designing reconciliation frameworks to compare datasets across systems, pipeline stages, or migration checkpoints
- Hands-on experience with Great Expectations, Deequ, or equivalent data quality tooling in production environments
- Familiarity with building regression test suites for data pipelines, including schema, value-level, and statistical checks
- Experience in profiling and benchmarking data pipeline performance, identifying bottlenecks, and recommending optimizations
- Experience supporting data migrations with end-to-end validation strategies across source and target systems is a strong plus
- Detail-oriented with strong problem-solving skills and a genuine commitment to data integrity and platform trust