DEPLOY is engaged in delivering a phased data strategy and AI enablement for a client with extensive datasets in Databricks. The Data Engineer will profile datasets, document schemas, assess data quality, and build data pipelines to support the AI-driven data platform.

Responsibilities:

Profile all 30+ datasets in Databricks: table structures, row counts, data types, distributions, refresh patterns
Document schemas with inferred relationships and primary/foreign key candidates
Assess data quality across dimensions: completeness, consistency, accuracy, freshness
Analyze historical data behavior — determine which datasets use snapshot vs. overwrite patterns
Support API and integration mapping (test data extraction capabilities)
Build standardized ingestion framework and data pipelines (Phase 2)
Implement data quality gates with automated validation and alerting (Phase 2)
Support workflow integration, feature engineering pipelines, and ML data products (Phases 3-4)

Requirements:

Strong SQL and Python skills
Experience with Databricks (notebooks, Spark SQL, Delta Lake)
Hands-on data profiling, data quality assessment, and technical documentation
ETL/ELT pipeline development experience
Comfort working in locked-down enterprise environments with restricted internet access
Comfort with undocumented, messy data — you'll be making sense of datasets that have limited or no documentation
Eager to learn AI tooling
Financial services, lending, or banking data experience
Experience with Medallion Architecture (bronze/silver/gold patterns)
Familiarity with Power BI as a downstream consumer
Experience working within VDI-based access environments
Experience with modern AI tool sets

Data Engineer

Key skills

About this role

Responsibilities:

Requirements: