Design and build scalable data pipelines using Databricks and Spark to ingest, transform, and unify data from multiple enterprise systems
Develop and maintain medallion architecture (Bronze, Silver, Gold) data models to create reliable and performant “Golden Record” datasets
Implement data normalization, mapping, and entity resolution techniques (e.g., fuzzy matching, XREF tables) to unify asset data across disparate systems
Build data workflows to detect and surface Shadow IT across financial, identity, endpoint, and network signals and integrate results into CMDB systems
Partner with IT, Security, Finance, Procurement, and GRC teams to define and enforce data standards for critical CMDB attributes (e.g., ownership, approval status, lifecycle)
Develop and maintain data integrations and APIs to synchronize curated datasets into operational systems such as ServiceNow and Jira Assets
Monitor, troubleshoot, and improve data quality, reliability, and observability across the data platform
Requirements
9+ years of experience building and maintaining data pipelines and large-scale data platforms
Strong experience with Databricks, Apache Spark, and SQL for distributed data processing and transformation
Experience designing data models and architectures such as medallion architecture, data lakes, or lakehouse systems
Proficiency in Python or similar programming languages for data engineering and ETL development
Experience integrating data from multiple enterprise systems (e.g., SaaS tools, financial systems, identity systems)
Strong understanding of data quality, data governance, and entity resolution techniques across heterogeneous datasets
Excellent collaboration and communication skills, with experience working cross-functionally with technical and non-technical stakeholders.