Quantum World Technologies Inc. is seeking a GCP Data Engineer with a strong healthcare background. The role involves architecting enterprise data platforms on Google Cloud, focusing on building a GCP BigQuery-based Data Lake and Data Warehouse ecosystem, while ensuring data governance and quality.
Responsibilities:
- Architect and design an enterprise-grade GCP-based data lakehouse leveraging BigQuery, GCS, Dataproc, Dataflow, Pub/Sub, Cloud Composer, and BigQuery Omni
- Define data ingestion, hydration, curation, processing and enrichment strategies for large-scale structured, semi-structured, and unstructured datasets
- Create data domain models, canonical models, and consumption-ready datasets for analytics, AI/ML, and operational data products
- Design federated data layers and self-service data products for downstream consumers
- Architect batch, near-real-time, and streaming ingestion pipelines using GCP Cloud Dataflow, Pub/Sub, and Dataproc
- Set up data ingestion for clinical (EHR/EMR, LIS, RIS/PACS) datasets including HL7, FHIR, CCD, DICOM formats
- Build ingestion pipelines for non-clinical systems (ERP, HR, payroll, supply chain, finance)
- Architect ingestion from medical devices, IoT, remote patient monitoring, and wearables leveraging IoMT patterns
- Manage on-prem → cloud migration pipelines, hybrid cloud data movement, VPN/Interconnect connectivity, and data transfer strategies
- Build transformation frameworks using BigQuery SQL, Dataflow, Dataproc, or dbt
- Define curation patterns including bronze/silver/gold layers, canonical healthcare entities, and data marts
- Implement data enrichment using external social determinants, device signals, clinical event logs, or operational datasets
- Enable metadata-driven pipelines for scalable transformations
- Establish and operationalize a data governance framework encompassing data stewardship, ownership, classification, and lifecycle policies
- Implement data lineage, data cataloging, and metadata management using tools such as Dataplex, Data Catalog, Collibra, or Informatica
- Set up data quality frameworks for validation, profiling, anomaly detection, and SLA monitoring
- Ensure HIPAA compliance, PHI protection, IAM/RBAC, VPC SC, DLP, encryption, retention, and auditing
- Work with cloud infrastructure teams to architect VPC networks, subnetting, ingress/egress, firewall policies, VPN/IPSec, Interconnect, and hybrid connectivity
- Define storage layers, partitioning/clustering design, cost optimization, performance tuning, and capacity planning for BigQuery
- Understand containerized processing (Cloud Run, GKE) for data services
Requirements:
- Strong experience architecting enterprise data platforms on Google Cloud (GCP)
- Deep hands-on expertise in data ingestion, transformation, modeling, enrichment, and governance
- Strong understanding of clinical healthcare data standards, interoperability, and cloud architecture best practices
- Architect and design an enterprise-grade GCP-based data lakehouse leveraging BigQuery, GCS, Dataproc, Dataflow, Pub/Sub, Cloud Composer, and BigQuery Omni
- Define data ingestion, hydration, curation, processing and enrichment strategies for large-scale structured, semi-structured, and unstructured datasets
- Create data domain models, canonical models, and consumption-ready datasets for analytics, AI/ML, and operational data products
- Design federated data layers and self-service data products for downstream consumers
- Architect batch, near-real-time, and streaming ingestion pipelines using GCP Cloud Dataflow, Pub/Sub, and Dataproc
- Set up data ingestion for clinical (EHR/EMR, LIS, RIS/PACS) datasets including HL7, FHIR, CCD, DICOM formats
- Build ingestion pipelines for non-clinical systems (ERP, HR, payroll, supply chain, finance)
- Architect ingestion from medical devices, IoT, remote patient monitoring, and wearables leveraging IoMT patterns
- Manage on-prem → cloud migration pipelines, hybrid cloud data movement, VPN/Interconnect connectivity, and data transfer strategies
- Build transformation frameworks using BigQuery SQL, Dataflow, Dataproc, or dbt
- Define curation patterns including bronze/silver/gold layers, canonical healthcare entities, and data marts
- Implement data enrichment using external social determinants, device signals, clinical event logs, or operational datasets
- Enable metadata-driven pipelines for scalable transformations
- Establish and operationalize a data governance framework encompassing data stewardship, ownership, classification, and lifecycle policies
- Implement data lineage, data cataloging, and metadata management using tools such as Dataplex, Data Catalog, Collibra, or Informatica
- Set up data quality frameworks for validation, profiling, anomaly detection, and SLA monitoring
- Ensure HIPAA compliance, PHI protection, IAM/RBAC, VPC SC, DLP, encryption, retention, and auditing
- Work with cloud infrastructure teams to architect VPC networks, subnetting, ingress/egress, firewall policies, VPN/IPSec, Interconnect, and hybrid connectivity
- Define storage layers, partitioning/clustering design, cost optimization, performance tuning, and capacity planning for BigQuery
- Understand containerized processing (Cloud Run, GKE) for data services