Joblet-AI is seeking a Site Reliability Engineer to ensure production systems are reliable, observable, and performant. The role involves combining software engineering with operations to automate processes and enhance system reliability.

Responsibilities:

Design and operate systems for high availability and performance
Build and maintain observability tooling (logging, metrics, tracing)
Define and track SLOs, SLIs, and error budgets
Lead incident response and post-mortem reviews
Automate operational toil through tooling and platform improvements
Partner with application teams on production readiness

Requirements:

4+ years in SRE, DevOps, or infrastructure engineering
Strong scripting and software engineering skills (Python, Go, or similar)
Deep experience with cloud platforms (AWS, GCP, Azure)
Hands-on with Kubernetes, Terraform, and observability platforms
Experience leading incident response in production environments
Strong understanding of distributed systems

Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: