Senior Software Engineer, Engine & Distributed Systems
San Francisco, California, United States of America
Full Time
6 hours ago
$220,000 - $240,000 USD
No Visa Sponsorship
Key skills
AirflowDistributed SystemsPostgresPythonFastAPI
About this role
Role Overview
Own the execution engine. The runtime, scheduling, and sub-agent parallelization that run every agent on the platform.
Make long-running work durable. Build checkpointing, resumption, and recovery so agents survive failures and restarts and pick up exactly where they left off.
Shape the execution model. Decide how work is scheduled, queued, and moved from synchronous to asynchronous, so the platform stays correct and responsive as load grows.
Engineer for scale and reliability. Hold the engine to strict health targets for worker freshness, deploy safety, and drain time, and keep latency and throughput strong as volume grows.
Keep the engine open to the ecosystem. Make it straightforward to bring new agent harnesses, orchestration frameworks, and model capabilities into the runtime.
Requirements
5+ years building backend systems in production, with real depth in distributed systems.
Hands-on experience with durable execution or workflow orchestration (Temporal, Cadence, Airflow, or equivalent), with a way of thinking rooted in idempotency, state machines, and failure recovery.
Strong command of concurrency, queueing, retries, and fault tolerance under load.
Strong in Python and modern backend frameworks (FastAPI or similar), with sound database fundamentals (Postgres or similar).
You're drawn to the correctness problems that everything else quietly depends on.