Medecision is seeking a Senior Software Engineer for their Data Platform to provide innovative solutions for managing health and care. The role involves designing and building a cloud-native data platform that supports clinical analytics and reporting, while ensuring the reliability and scalability of data pipelines.
Responsibilities:
- Design, develop, and maintain production-grade Python and Java data services and pipelines deployed on Google Cloud Platform, following established architectural conventions, coding standards, and data platform patterns
- Build and evolve Google Cloud Dataflow batch and streaming pipelines for data ingestion, standardization, curation, and analytics load. Handling deduplication, validation, member-reference integrity, and incremental/full-reload modes
- Implement and maintain event-driven data workflows using Google Cloud Pub/Sub, including file-complete notification topics, FHIR ingest topics, and Pub/Sub ? BigQuery export subscriptions
- Design and manage BigQuery datasets, table schemas, partitioning strategies (range-bucket by member partition), clustering, and reporting views — across standard analytics, custom analytics, and curated storage layers
- Build and maintain Cloud Composer (Apache Airflow) DAGs for workflow orchestration — including file ingestion DAGs, test execution DAGs, and third-party processing DAGs (e.g., MEG)
- Develop and maintain Cloud Run microservices (e.g., Ingestion Event Service, Custom Dataset Management Service) and Cloud Functions (inbound GCS bucket triggers)
- Participate in design and code reviews; mentor junior engineers and contribute to shared coding standards, patterns, and team knowledge base
- Collaborate with on-shore and off-shore teams, architects, and tech leads to ensure on-time delivery and best engineering practices
- Contribute to CI/CD pipeline improvements using GitLab CI/CD, including build, test, containerization (Docker), and deployment automation to GCP environments
- Engage proactively in the triage and resolution of escalated production issues — diagnosing failures, investigating root causes, and driving durable fixes with a sense of urgency, clear communication to stakeholders, and a commitment to preventing recurrence
- Follow and comply with all security policies and procedures established by the organization, including adherence to HIPAA and HITRUST regulations
- Own and evolve the end-to-end data ingestion pipeline: from SFTP/GCS file receipt through ingestion, standardization, curation, and load to BigQuery analytics datasets
- Design and implement custom dataset ingestion capabilities — including dynamic schema mapping, configuration-driven pipeline execution, and automatic BigQuery table/view provisioning on dataset activation
- Maintain and improve the Ingestion Event Service (IES). The platform's central tracking service for data ingestion progress — including Pub/Sub async processing and Firestore document management
- Implement FHIR R4 data ingestion workflows via Pub/Sub and GCP Healthcare Datasets, including XSD/schema validation and HL7 resource mapping
- Support population matching data flows, ensuring custom dataset filters integrate correctly with population builder services and BigQuery analytics queries
- Build and maintain reporting dataset views (BI-compatible BigQuery views) for tenant data exposure
- Integrate with third-party clinical analytics engines via GCE VM instance templates, Airflow DAGs, and GCS-based data exchange
- Ensure all data services meet HIPAA requirements: PHI handling, tenant data isolation, audit logging, and data classification
- Demonstrate a solid understanding of AI concepts, capabilities, and limitations as they apply to software engineering workflows, including code generation, test scaffolding, and documentation
- Use Claude Code as a primary productivity tool for code drafting, refactoring, test generation, and technical documentation — applying it with judgment, rigor, and accountability
- Leverage AI-assisted workflows to accelerate implementation, surface edge cases, generate structured artifacts, and conduct at-scale analysis of service dependencies and API contracts
- Contribute to building and exposing MCP-wrapped APIs that enable AI agents to safely interact with platform services
- Maintain strict HIPAA discipline in all AI-assisted work: no real PHI in prompts or AI-generated artifacts; adhere to managed-settings policies and complete mandatory HIPAA + AI training
- Contribute to the team's shared AI knowledge base. Validated prompts, skills, and workflows — and participate in the AI Champions community of practice
Requirements:
- Bachelor's degree in Computer Science, Software Engineering, Data Engineering, or equivalent practical experience
- 5+ years of data engineering or backend software engineering experience building production data pipelines and platform services
- Proven hands-on experience with Google Cloud Platform data services: BigQuery (schema design, partitioning, clustering, query optimization), Cloud Storage, Cloud Pub/Sub, Cloud Dataflow (Apache Beam), Cloud Composer (Airflow), Cloud Run, Cloud Functions, Firestore, Cloud SQL (PostgreSQL), and Secret Manager
- Strong proficiency in Java for data pipeline development
- Proficiency in Python for Airflow DAG authoring, and automation scripting
- Experience designing and implementing batch and streaming data pipelines — including file-based ingestion, event-driven processing, deduplication, validation, incremental load, and full-reload patterns
- Proficiency with BigQuery data modeling: partitioned and clustered tables, dataset organization (standardized, curated, analytics, custom analytics, reporting layers), and SQL query optimization
- Experience with Apache Airflow / Cloud Composer — authoring, deploying, and maintaining production DAGs with parameterized configurations and robust error handling
- Experience with containerization (Docker) and deploying services to cloud-native environments
- Proficiency with GitLab CI/CD for pipeline automation and multi-environment deployment
- Excellent communication skills — able to articulate technical decisions, participate in design reviews, and collaborate effectively with cross-functional teams
- Familiarity with Datadog for service monitoring, alerting, and observability in a cloud-native data platform
- Familiarity with Sisense or equivalent BI/reporting platforms and BigQuery view-based reporting patterns
- Solid understanding of AI concepts, capabilities, and limitations as they apply to software engineering and product delivery workflows
- Hands-on experience with Claude Code or equivalent AI-assisted tools. Used as a primary productivity tool for code generation, refactoring, test scaffolding, and documentation, not just experimentally
- Ability to evaluate AI-generated code critically: identifying hallucinations, logic errors, security gaps, and missing edge cases before they reach production
- Practical understanding of MCP (Model Context Protocol) or strong willingness to learn — for building tool wrappers that expose platform APIs to AI agents safely and with appropriate guardrails
- Commitment to responsible AI use: applying AI with judgment, rigor, and personal accountability. Consistent with the principle that humans own decisions, agents own toil
- HIPAA discipline in AI-assisted work: understanding of PHI boundaries in AI workflows and commitment to managed-settings policies and mandatory HIPAA + AI training
- Openness to contributing to and learning from a shared AI knowledge base. Validated prompts, skills, and workflows — and active participation in the AI Champions community of practice
- Knowledge of HIPAA and experience working in HIPAA-regulated product environments, including PHI handling, data classification, and audit requirements
- Hands-on experience with HAPI FHIR R4 and healthcare interoperability standards (HL7, FHIR resource mapping, validation workflows)
- Understanding of multi-tenant SaaS architecture patterns — tenant context propagation, per-tenant feature flags, and data isolation