Own the reliability and scalability of our platform—design, build, and operate the infrastructure that keeps ConductorOne running for customers who depend on us
Build observability that drives action—create monitoring, alerting, and tooling that helps teams understand system behavior and respond to incidents quickly
Drive operational excellence across engineering—partner with product teams to ensure new features are built with reliability in mind from the start
Automate relentlessly—if you're doing something twice, build a system to do it for you. We believe in infrastructure as code and eliminating toil.
Respond to and learn from incidents—lead incident response, conduct blameless postmortems, and drive systemic improvements
Plan and execute infrastructure projects with incremental deliverables—you'll assess technical risks, communicate tradeoffs, and ship iteratively

A track record of building and operating production systems at scale—you've kept real systems running for real customers
Deep experience with cloud infrastructure (AWS, GCP, or similar) and infrastructure-as-code (Terraform, Pulumi, or similar)
Strong programming skills in Go, Python, or similar—you write tools and automation, not just scripts
Experience with Kubernetes and container orchestration in production
Strong systems thinking—you understand how components interact and where failures cascade
High agency—you figure out what needs to be built, not just how to build what you're told. You move fast and unblock yourself.
Deep understanding of observability—you know how to instrument systems, build dashboards, and create alerts that actually matter
Experience with AI-assisted development (Claude Code, Cursor, Copilot, or similar)—you're already using these tools and excited about what's next
Clear, persuasive communication—you can explain complex systems to diverse audiences and drive alignment during incidents
Ego in check—you care about getting it right, not being right.

Site Reliability Engineer

Key skills