Obsidian Security is a company focused on securing SaaS applications for modern businesses. The Staff Site Reliability Engineer will lead the reliability strategy for a complex multi-tenant SaaS platform, ensuring proactive detection of system failures and improving incident response processes.

Responsibilities:

Map and instrument critical system paths for top-tier enterprise customers
Build connector health models to classify issues: Internal defects (“our bug”), Upstream SaaS outages, Expected sparse/low-signal scenarios
Establish tiered incident communication: Public status page for all customers, Direct outreach for high-priority accounts
Define and begin rollout of SLI/SLO standards across microservices
Develop self-service instrumentation tooling enabling engineering teams to own observability
Implement baseline-aware anomaly detection across all connectors (beyond static thresholds)
Mature incident response processes, including: Structured post-mortems, Continuous reliability improvements

Requirements:

7+ years in SRE, production engineering, or similar roles
2+ years operating as a technical lead
Deep expertise with: AWS and/or GCP
Deep expertise with: Kubernetes, Helm
Deep expertise with: Observability stack (Prometheus, Grafana)
Deep expertise with: CI/CD systems (GitLab CI/CD, ArgoCD)
Proven experience building monitoring for multi-tenant SaaS systems with complex data pipelines
Strong debugging skills across distributed microservices and legacy systems
Hands-on engineering mindset — able to instrument services directly, not just configure tooling
Track record of building or significantly improving incident detection and response systems
Experience in B2B SaaS serving enterprise or financial customers
Familiarity with third-party SaaS connector ingestion patterns
Experience building anomaly detection systems or baseline-aware alerting
Experience implementing customer-facing status pages and incident communication frameworks

Staff Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: