Babylist is a leading platform for expecting and new families, focused on reshaping the kids and baby market. The Staff Software Engineer, Site Reliability role involves owning the infrastructure and reliability practices that support millions of users while actively evolving AWS infrastructure, CI systems, and developer tooling.
Responsibilities:
- Manage and evolve our AWS environment using Terraform, keeping EKS clusters, databases, and core services current and performant
- Own the speed and reliability of our CI systems for the full Engineering org — every deploy starts here
- Be the person engineers turn to when environments break; unblock them fast across local, staging, and production
- Establish and socialize best practices so the right people get paged for the right reasons
- Lead or support incident response, drive post-incident reviews, and close the loop so the same thing doesn't happen twice
- Contribute to architectural decisions that shape how Babylist's infrastructure evolves over the next several years
Requirements:
- Deep hands-on Terraform expertise — you own IaC, not just contribute to it
- Proven AWS experience at scale — EKS, RDS, cloud networking, DNS, CDNs, load balancers — you know the gotchas
- Experienced operating Kubernetes in production — you've debugged the hard stuff, not just deployed the easy stuff
- Comfortable designing and improving CI/CD systems — CircleCI, GitHub Actions, or similar; you care about developer velocity, not just pipeline uptime
- Strong observability instincts — Datadog, Sentry, PagerDuty, Cronitor — you build alerting that's actionable, not noisy
- Experienced with on-call and incident management — you've run the post-mortems and actually changed things afterward
- Comfortable supporting developers across local, staging, and production — you're a resource, not a gatekeeper
- You naturally reach for AI in your work — at Babylist, every team uses AI daily. You're already using it to move faster and improve your output, and you stay curious about what's coming next