Design, implement, and maintain data models, pipelines, and architecture to support current and future product needs – focusing on healthcare datasets such as claims, EHR, BCDA, and CCLF.
Design processes to consolidate data from multiple sources into a single source of truth.
Contribute to data governance policies, standards, and procedures to ensure data quality, security, HIPAA and other regulatory compliance.
Identify and optimize performance bottlenecks in ETL/ELT pipelines – ensuring the long term scalability of pipelines.
Work closely with Data Scientists, Engineers, and business stakeholders to understand requirements and translate them into high-value, technical solutions.
Requirements
2+ years of engineering experience
Core Competencies:
Python
SQL
Apache Spark and PySpark
Data Modeling & Warehousing Concepts
OLAP vs OLTP
Normalized vs Denormalized data, one big table
SCDs
Medallion Architecture
Tools & Technology
Experience working with dbt or SQLMesh
Experience with at least one major data platform and warehousing tool such as Databricks or Snowflake
Security & Compliance
Awareness of Data Governance and Security concepts; Discoverability Role Based Access Control
Tokenization
Prior experience working with PHI/PII and experience working in audited environments
Healthcare Domain Knowledge
HIPAA, SOC2, and HITRUST compliance requirements and implementation
Healthcare interoperability standards (FHIR, HL7)
EHR operational data and claims data processing experience
Note: Healthcare experience not required if applicant has sufficient experience in other, highly regulated environments, and demonstrates an advanced understanding of data security and governance.
Tech Stack
Apache
ETL
PySpark
Python
Spark
SQL
Benefits
Health, Dental, Vision & Voluntary Benefits
Competitive Salary & Bonus Plans
401k Retirement Savings
Flexible PTO & 10 Paid Holidays
Flexible Work Hours
Equity Shares
Paid Leave Programs
Marketplace for discounted retail and entertainment