
Location: Remote (USA) Job Type: Full-Time / Permanent Industry: Healthcare / Medicare Claims Engineering
As a Staff Platform Engineer, you will be the technical authority for the design, deployment, and operational excellence of our Temporal.io platform. This platform is the backbone of a high-stakes strategic initiative: migrating our legacy RPA portfolio (Automation Anywhere) to a modern, code-and-config-driven workflow engine powered by Python, GKE, and Playwright.
This is a senior individual contributor role where you will own the Temporal Server topology and the Config Interpreter. You’ll build the bridge that turns canvas-authored visual workflows into durable Temporal executions without requiring hand-written Python for every new use case. You will influence architecture across teams, ensuring our Medicare claims automation is resilient, HIPAA-compliant, and highly scalable.
Platform Architecture: Own the Temporal Server cluster on GKE, including service topology, namespace strategy, persistence (Cloud SQL PostgreSQL with HA), and history shard sizing.
The Config Interpreter: Design and maintain the generic Python workflow that loads versioned JSON/YAML configs to dispatch pre-built activities, eliminating per-workflow codegen.
Framework Design: Define Python-based workflow conventions: determinism, retry/timeout policies, heartbeating, signals for human-in-the-loop, and state queries.
Activity Ecosystem: Own the activity library contract and domain-plugin patterns (SQL, scraping, SOAP envelopes) allowing triggers to register logic without modifying shared executors.
Operational Excellence: Lead capacity planning, performance tuning, and RCA for high-throughput claims workloads. Define standards for observability (PrometheGrafana) and distributed tracing.
Deployment & Governance: Champion IaC (Terraform) and CI/CD for worker pools, including blue-green deployments with task queue draining. Set HIPAA-aligned standards for PHI handling and archival.
Technical Leadership: Mentor senior engineers, lead deep-dive design reviews, and influence engineering direction through technical authority rather than formal management.
Mandatory Technical Expertise: Hands-on production experience with Temporal.io (or Cadence). You must understand the internals of durable execution.
Distributed Systems: Multiple years of experience building/operating large-scale, high-availability transactional systems.
Python Mastery: Deep expertise in Python, specifically async programming and reasoning about determinism and replay semantics.
Infrastructure: Hands-on experience with PostgreSQL at scale (connection pooling, tuning) and container orchestration.
Reliability Engineering: Strong understanding of fault tolerance, idempotency, and disaster recovery patterns.
Collaborative Mindset: Proven ability to partner across Security, Networking, and Application teams to deliver platform-level guardrails.
Cloud Native: Production experience with Google Cloud Platform (Google Cloud Platform): GKE, Cloud SQL, Cloud Logging, and Managed Prometheus.
Temporal Advanced Usage: Experience with namespaces, advanced visibility (Elasticsearch), Archival, and Helm-based deployments.
DevOps/IaC: Expertise in Terraform, Helm, and Kubernetes autoscaling based on custom metrics.
Regulated Industry: Familiarity with HIPAA/PHI data handling and audit logging within a healthcare context.
RPA Migration: Experience migrating workloads from legacy RPA tools (Automation Anywhere, UiPath) to code-first orchestration.