Ooma is a company that empowers people to connect through powerful communication experiences via their cloud-based platform. They are looking for a Site Reliability Engineer to ensure service reliability at scale, improve application observability, and mentor team members while collaborating with various teams to support Ooma's applications.

Responsibilities:

Become a subject matter expert in applications supporting Ooma customers
Collaborate with Development, QA and other SREs to evaluate, deploy, and debug applications
Improve observability by implementing, refining, and adjusting application monitoring and thresholds
Mentor team members to enhance application management practices
Act as an escalation path and backup for junior team members, providing guidance during alerts and incidents
Write automation scripts, set up CI pipelines, and review/evaluate software solutions and best practices
Participate in on-call rotations, providing 24/7 support for Ooma services

Requirements:

Strong background in production (24/7) support for large-scale environments required
6+ years of Linux administration and troubleshooting experience with full-stack, application support focus; 8+ years overall working experience as an IT professional
Proven expertise in advanced scripting using Python, Perl, and Bash
Database administration experience with MySQL, MongoDB, or PostgreSQL required
Must have experience with configuration management tools such as Ansible, Puppet, etc
Hands-on experience with cloud platforms such as OCI, AWS, or GCP required
Proven ability to lead technical projects from inception to completion
Strong collaboration skills and empathy for the end-user experience
Excellent troubleshooting, communication (written and verbal), and cross-functional leadership skills
Ability to work effectively in fast-paced, dynamic environments and manage ambiguity
Quick learner with a self-starter mindset and ownership of outcomes
Sound judgment in escalation and decision-making
Bachelor's degree in Engineering/Computer Science or equivalent experience
Experience with DevOps tools, like Docker, K8s, Gitlab CICD, Jenkins, Terraform preferred
Experience with monitoring best practices using ELK stack, Prometheus, Nagios, or Grafana preferred
Experience with Agile tools like Jira, Confluence, or any similar tool, preferred

Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: