Red Cell Partners is an incubation firm building and investing in rapidly scalable technology-led companies. They are seeking a Principal Software Engineer to own the core execution model and platform architecture of Trase OS, ensuring correctness, scalability, and extensibility of the system while leading technical direction across teams.
Responsibilities:
- Architect & lead the core execution model (state machine, lifecycle, resource model, failure semantics)
- Design platform APIs/SDKs connecting workflows, agents, tools, and product surfaces; drive versioning & compatibility
- Guarantee correctness via idempotency, deterministic replays, compensating actions, and data integrity
- Engineer reliability at scale: concurrency controls, rate limits, backpressure, sharding/partitioning, and workload isolation
- Build security & governance into the core: RBAC/ABAC, policy enforcement, fine-grained audit & lineage
- Deliver observability: distributed tracing, structured logs, metrics, and evaluation hooks; build an “explainable trail” of agent actions
- Own quality: design reviews, test strategy (unit, property, chaos), performance baselines, SLOs, incident response, and postmortems
- Mentor & unblock senior engineers; partner with Product, Security, and Customer teams to translate requirements into durable primitives
- Make pragmatic choices on storage, queueing, and compute; create paved roads that accelerate all other teams
- Define system boundaries and reduce cross-service coupling through clear architectural patterns
- Drive platform-wide standards for correctness, reliability, and API design across teams
- Balance short-term delivery with long-term architectural integrity, ensuring the platform evolves without accumulating systemic risk
- Define and drive the long-term technical architecture of Trase OS across teams and domains
- Influence company-wide technical direction for platform and product systems
- Lead cross-team initiatives that shape how workflows, agents, and platform primitives are built and evolve
- Partner with leadership to align technical architecture with product and business strategy
- Mentor senior and staff engineers and raise the bar for system design and architectural thinking
Requirements:
- 12-15+ years of experience building distributed/platform systems, including significant experience defining architecture across teams or domains
- 10+ years owning mission-critical runtimes or workflow/orchestration systems
- Deep expertise with durable execution (e.g., state machines, event sourcing, saga/compensation, idempotency, exactly/at-least-once semantics)
- Proven track record with security & governance in production systems (auth, RBAC, audit, policy)
- Hands-on with observability (Grafana or equivalent), including trace correlation across async boundaries
- Strong systems design across storage, queues, schedulers, and evented architectures; performance tuning under load
- Excellence in a modern language (e.g., Go, Rust, Java, or TypeScript) and cloud-native stacks (containers, CI/CD, IaC)
- Comfortable operating in regulated or high-assurance environments; bias toward correctness, clarity, and documentation
- Proven ability to influence technical direction across an organization and drive adoption of architectural standards
- Ability to incorporate advance LLM capabilities into system design and platform architecture decisions where appropriate
- Prior work on workflow engines (Temporal/Cadence/AWS Step Functions, Argo, Airflow) or serverless runtimes
- Experience with policy engines (OPA), secrets/KMS, or data-handling controls (PII/PHI)
- ML/LLM evaluation frameworks, tool/plugin architectures, or embedding model governance into execution
- Government or healthcare experience (HIPAA, audit readiness) and multi-tenant isolation