Create data architecture, pipelines, and analytical solutions to meet software and data science requirements for various PG-MX Healthcare Products
Identify, evaluate, select, and prove out new technologies and toolsets
Create and execute Proofs of Concept and Proofs of Technology
Lead and direct the work of others in data dependent projects
Collaborate with software development, business teams, analysts, and data scientists to establish data storage, pipeline, and structure requirements
Design, develop, and maintain ETL/ELT pipelines using Databricks (PySpark, Delta Lake, SQL)
Implement Data Lakehouse architecture leveraging Databrick Unity Catalog, Delta Live Tables, and Workflows
Build and optimize data ingestion frameworks for structured and unstructured data from diverse sources
Identify and plan for data storage performance requirements
Optimize Databricks clusters, jobs, and queries for performance and cost efficiency
Identify impact of implementation on other applications and databases
Lead and mentor data engineers on data projects
Implement CI/CD for data pipelines using Git, Databricks Repos, and DevOps tools
Ensure Data quality, reliability, security compliances across environments
Build and evolve Trusted Record systems to manage entities across the enterprise
Design, implement, and evolve solutions around person identity management.
Develop and enforce data governance, lineage, and cataloging standards
Identify areas of development and need, provide targeted training and exploration for team members.
Requirements
Minimum of 5 years Data Engineering experience in an enterprise environment
Bachelor’s degree in technology or like field required
Hands on experience of Azure data technologies (Databricks, Data Factory, Stream Analytics, Data Lake Storage, Synapse), on-premises Microsoft tools (SQL DB and SSIS), and familiar with AWS data technologies
Proficiency in Python, SQL and distributed data processing frameworks (Spark), and familiar with C#, PowerShell, and APIs
Significant experience with analytical solutions in relational databases such as MS SQL Server, Oracle, and DB2 as well as experience with NoSQL databases and solutions such as data lakes, document-oriented databases, and graph databases
Strong understanding of data modeling, schema design, and ETL best practices
Experience with data lineage, cataloging, and metadata management in Databricks and Unity Catalog
Skill in data modeling and experience with tools like ER/Studio or Erwin
Familiarity with version control (Git) and DevOps/CI-CD practices
Familiarity with SQL performance tuning and Spark optimization techniques
Excellent problem-solving and communication skills.
Tech Stack
AWS
Azure
ETL
MS SQL Server
NoSQL
Oracle
PySpark
Python
Spark
SQL
SSIS
Unity
Benefits
Health insurance
401(k) matching
Flexible work hours
Paid time off
Discretionary bonus or commission tied to achieved results