Satsuma is a commerce iPaaS that builds merchant-specific APIs and enables retailers to connect their full commerce stack. They are seeking a Senior Site Reliability Engineer to own the reliability and operational posture of their multi-cloud infrastructure, ensuring systems run smoothly and efficiently.
Responsibilities:
- Own infrastructure across AWS, GCP, and Azure environments
- Build and maintain CI/CD pipelines, observability stacks, and incident response workflows
- Define and enforce SLOs/SLIs; lead postmortems
- Author and maintain IaC (Terraform preferred)
- Write internal tooling and automation using AI-assisted development workflows
- Partner closely with engineering on reliability reviews and architecture decisions
Requirements:
- 5-8 years in SRE, DevOps, or infrastructure engineering
- Hands-on experience across at least two major cloud providers
- Strong Kubernetes, Terraform, and observability tooling (Datadog, Grafana, or equivalent)
- Comfortable reading and editing code; able to ship scripts and internal tools
- Experience with AI-assisted development (Copilot, Cursor, Claude Code)
- On-call maturity -- you've owned incidents end-to-end and made systems better afterward
- Prior experience at a startup or high-growth SaaS company
- Familiarity with API gateway infrastructure or commerce tech stacks
- Hands-on experience with MCP or agentic AI infrastructure