Support software engineers by creating, maintaining, and improving observability and alerting tools and frameworks. You embrace the use of AI, leveraging agentic to eliminate toil and streamline your daily tasks
Own the Service Level Objectives (SLOs) framework, assist in the design and maintenance of indicators (SLI) and objectives to ensure service reliability.
Owning the incident management process by defining best practices, standards, and ensuring continuous improvement through post-mortems and chaos engineering. While developers handle incidents within their scope, you could step in as Incident Commander during high-severity incidents, leading coordination efforts .
Develop and maintain tools, such as Terraform modules or Go apps, to help automate and enhance reliability across services.
Build and promote reporting on operational metrics and incidents to drive distributed and continuous improvement.
Requirements
1 to 5 years of experience in SRE, DevOps, or Software Engineering roles
Working in a multidisciplinary environment will request strong communication skills : you'll need to adapt your communication level to other teams expertise and be able to understand their needs
Strong knowledge of observability tools (e.g., Datadog) and understanding of metrics, logging, and tracing.
Troubleshooting/oncall experience in production environments, diagnosing and resolving technical issues effectively (experience with Kubernetes is a plus).
Full working proficiency in English
Fit with our BlaBlaPrinciples
Thriving in a collaborative, fast-growing and innovative environment
Ability to take ownership, aligned with business priorities and navigating in different contexts
Nice to have:
Familiarity with incident management platforms (e.g., Grafana IRM) is a bonus
Experience working with Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
Exposure to programming in Go or a strong interest in learning it.
Experience in integrating Opentelemtry
Backend services are built using multiple programming languages: while development skills aren't required, familiarity with object-oriented programming and scripting languages is an advantage.
Familiarity with web/mobile testing tools or a strong curiosity to understand how software is tested at scale.
Tech Stack
Grafana
Kubernetes
Terraform
Go
Benefits
Hybrid status for this role : 2-3 days at the Office
4 additional weeks on top of legal maternity/paternity leaves