About this role

HiBob is a company focused on AI-driven operations, and they are seeking a Senior Site Reliability Engineer to enhance production stability and automation. The role involves collaborating with global DevOps teams to manage AWS/Kubernetes environments and improve operational excellence.

Responsibilities:

Design, build, and operate production-grade Kubernetes infrastructure on AWS
Developing Ai Agents to handle incidents and root cause analysis
Build and maintain GitOps-based CI/CD pipelines using GitHub Actions and ArgoCD
Develop internal DevOps tooling and developer self-service platforms
Own monitoring, observability, and operational excellence using Datadog
Collaborate with engineering teams to improve delivery speed and reliability

Requirements:

5+ years of experience as a SRE or DevOps Engineer (this is a hard requirement)
Extensive experience managing live, high-traffic SaaS environments; developer-only backgrounds without ops experience will not be a fit
Proven mastery of Kubernetes and AWS in production settings
A strong understanding of or direct experience with AI/LLM technologies
Hands-on experience with Datadog for monitoring and incident response
Ability to work independently without direct daily oversight, managing production incidents and on-call responsibilities
Located in the East Coast time zone to provide coverage overlap with our global teams
Advanced proficiency in Python (preferred) or Go for automation

Senior Site Reliability Engineer - Remote EST

Key skills

About this role

Responsibilities:

Requirements: