Sonio is a mission-driven company focused on improving women's and children's health through technological innovation. As the first SRE in the US, you will own the platform's stability and releases, ensuring a secure and resilient production environment while working closely with the DevOps team.
Responsibilities:
- Own US coverage for releases and incidents as the first responder during PST hours
- Bridge infra and code by working hand-in-hand with our DevOps team on Kubernetes, Terraform, and AWS, while being able to read and patch Elixir code to unblock yourself without waiting for a backend engineer
- Drive incident response end-to-end, managing triage, mitigation, and blameless post-mortems with real follow-through
- Improve the platform’s operability by defining SLOs, tuning alerts to reduce toil, and pushing observability (metrics, logs, tracing) where it’s lacking
- Transfer operational knowledge from France to the US by authoring runbooks and documenting procedures so local teams are empowered to act when something breaks
- Support compliance and security in our regulated medical-device environment, maintaining HIPAA-aligned controls and an audit-ready infrastructure
Requirements:
- 4+ years of experience in SRE, DevOps, or Production Engineering, including significant on-call experience on a 24/7 product
- You possess a hybrid 'code-literate' mindset, acting as an infrastructure expert who can also navigate a backend codebase to triage and patch issues independently
- You bring strong technical foundations in Kubernetes, Terraform, and AWS, along with the ability to architect and tune your own observability signals
- You are highly autonomous and comfortable making technical decisions with limited supervision, which is essential given the timezone difference with France
- You maintain operational rigor and stay calm under pressure, with the written English skills necessary to produce high-quality runbooks and handle async handoffs