GE HealthCare is focused on building clinical grade data platforms that drive meaningful insights in healthcare. In this role, you will design and evolve cloud-native data platforms on AWS, ensuring secure data ingestion, governance, and analytics to improve patient outcomes and support various stakeholders in the health sector.
Responsibilities:
- Design and evolve cloud‑native data platforms using S3, Lake Formation, Glue (catalog/ETL), Athena, EMR/EKS‑Spark, Redshift (including serverless), and Kinesis/MSK for streaming
- Define lake and lakehouse patterns, real‑time and batch pipelines, and governed self‑service analytics capabilities
- Implement PHI tokenization/pseudonymization, fine‑grained access controls (column/row level), Macie discovery, encrypted storage (KMS), and data retention/lineage strategies using Glue and tags
- Apply DLP and other privacy‑preserving controls aligned with HIPAA, GDPR, HITRUST, and FDA/ISO frameworks
- Enable data exchange using FHIR, DICOM, HL7, and device telemetry through IoT Core into streaming and lake layers
- Build governed ML workflows with SageMaker pipelines, model registry, lineage tracking, explainability, and bias reporting
- Support dataset versioning and incorporate human‑in‑the‑loop processes when needed
- Lead data mesh/product governance, enable Redshift/Athena consumption, support DataZone cataloging and access workflows, and utilize Clean Rooms for privacy‑preserving collaboration
- Architect for resiliency across multi‑AZ/multi‑region deployments, including S3 replication, lifecycle management, partitioning/compaction, and cost‑efficient performance tuning
- Maintain validation packages for regulated analytics and AI pipelines, including traceable lineage and CFR Part 11 evidence
Requirements:
- 12+ years of experience in data or analytics platforms
- 6+ years leading AWS data architecture at scale
- Deep expertise with S3, Lake Formation, Glue, Athena, EMR, Redshift, Kinesis/MSK, and SageMaker
- Experience governing PHI and regulated ML workflows
- Experience with table formats such as Apache Iceberg, Delta Lake, or Hudi, and ACID‑on‑lake patterns
- Knowledge of CDC ingestion (DMS)
- Familiarity with curated imaging pipelines (DICOM) and vector search for clinical text/notes
- FinOps practices for data platforms (tiering, compression, query optimization)