Design and build production-grade data pipelines in Databricks using Spark/PySpark and SQL.
Develop and maintain an Analytics ID stitching pipeline using deterministic and probabilistic matching techniques across multiple customer data sources.
Build and manage modular data marts (Identity, Behavior, Demographics) with independent refresh cadences.
Implement and maintain a scalable feature store supporting downstream analytics and data science use cases.
Own the end-to-end data lifecycle: ingestion, transformation, validation, deployment, monitoring, and optimization.
Develop data quality frameworks including schema drift detection, anomaly monitoring, match-rate validation, and automated deduplication audits.
Implement CI/CD processes for multi-environment promotion (dev/staging/prod) in Databricks environments.
Coordinate orchestration workflows and manage dependencies using Databricks Workflows or similar tools.
Collaborate closely with Data Architects and Client stakeholders to translate business rules into scalable technical solutions.
Produce comprehensive technical documentation including data contracts, lineage maps, architecture diagrams, and operational runbooks.
Requirements
4+ years of experience in Data Engineering building production-grade data pipelines at scale.
Strong hands-on experience with Databricks and Apache Spark (PySpark preferred).