Temporal Technologies is an open-source programming model company aiming to simplify code and enhance application reliability. They are seeking a Staff Software Engineer to lead the design and implementation of a new persistence layer for Temporal Visibility, focusing on scalability and performance while ensuring safe data migrations and operational reliability.

Responsibilities:

Re-architect Temporal Visibility at scale
Lead the design and implementation of a new persistence layer for Temporal Visibility, informed by its real-world access patterns (high-volume writes, time-based queries, filtering, sorting, and pagination across long-running workflows)
Evaluate and select the most appropriate storage technologies (e.g., ClickHouse, Elasticsearch, or complementary systems), clearly articulating tradeoffs around indexing models, consistency, cost, latency, and operational complexity
Design schemas, APIs, and query models that make Visibility both powerful and intuitive for customers
Plan and execute online migrations of live Visibility data from existing persistence stores to new backends, at scale and without customer downtime
Design dual-write, backfill, validation, and cutover strategies that prioritize correctness, observability, and rollback safety
Build tooling and automation to validate data integrity and performance throughout the migration lifecycle
Define and own SLOs for Visibility storage and query paths
Profile hot paths, design benchmarks, and lead systematic performance tuning efforts
Build operational playbooks, dashboards, and alerting that make the system understandable and debuggable for on-call engineers
Lead incident reviews and reliability improvements related to persistence and indexing systems
Break down large, ambiguous roadmap initiatives into concrete, executable phases
Author and steward design docs and RFCs through review with peers and stakeholders
Mentor and unblock other engineers working in the persistence and storage domain
Partner closely with Server, Cloud, and Developer Experience teams to land features end-to-end

Requirements:

5 or more years of experience as an 'Arranger' and/or 'Builder/Enhancer' of highly scalable distributed systems
Strong computer science fundamentals in distributed systems, including concurrency, consistency models, and failure modes
Significant experience writing and operating concurrent production systems in Go, Java, or similar languages, at a high-end intermediate to expert level
Experience writing concurrent code in production with languages like Go or Java or other applicable languages with skill level as 'high end of Intermediate' and/or 'Advanced' or 'Expert' levels
Hands-on experience designing, operating, and tuning ClickHouse and/or Elasticsearch, ideally in self-hosted environments (managed services are a strong plus)
Experience building and running services on AWS. Bonus: Azure and/or GCP experience
Demonstrated ability to lead large, multi-quarter technical initiatives, especially those involving core data infrastructure and live data migrations
Prior contributions to Temporal, Cadence, or other workflow engines
Deep expertise in storage internals (e.g., columnar stores, LSM trees, inverted indexes, transactional logs)
Experience operating multi-region services with ≥99.99% uptime
Strong background in operating and evolving Open Source systems
Experience building Kubernetes controllers and/or CRDs

Staff Software Engineer, Visibility

Key skills

About this role

Responsibilities:

Requirements: