The ReWork Group partners with high-growth startups to build the future. They are seeking a Senior Site Reliability Engineer to develop reliable infrastructure and proactive monitoring solutions as their client scales from thousands to millions of users.

Responsibilities:

Lead incident response and establish sustainable on-call practices, including comprehensive runbooks, blameless postmortems, and systematic improvements that reduce MTTR
Develop and maintain self-service observability solutions using modern monitoring tools that provide actionable insights for troubleshooting and performance optimization
Create and maintain infrastructure as code (using Terraform, CloudFormation) that allows for consistent, scalable, and secure cloud environments on AWS
Partner closely with feature teams to architect resilient infrastructure for critical components (databases, networking, async workflows, data pipelines) that scale seamlessly
Work closely with DevX to design and implement robust CI/CD pipelines with advanced deployment strategies (blue/green, canary) that enable teams to ship confidently and rapidly
Advocate for best practices early in feature design, ensuring we design with reliability in mind and future-proof our services

Requirements:

5+ years in SRE or DevOps — or 7+ years in software engineering with a serious infrastructure focus and the scars to prove it
You've led incident response for high-availability production systems — you run tight RCAs, you drive blameless postmortems, and you leave every incident with a team that's smarter than before
You've designed highly available deployment architectures across multiple targets — EC2, Fargate, and beyond — with real expertise in auto-scaling, health checks, and graceful degradation when things get hard
You've implemented monitoring and observability solutions that actually get used — Datadog, Prometheus, ELK, or comparable — and you've made the case internally for why observability isn't optional
Deep AWS fluency and a strong infrastructure-as-code practice — Terraform is your default, not your fallback
You've built and improved CI/CD pipelines that give engineering teams the confidence to ship fast and reliably

Senior Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: