Ooma is a company that empowers people to connect through powerful communication experiences via their cloud-based platform. They are looking for a Site Reliability Engineer to ensure service reliability at scale, improve application observability, and mentor team members while collaborating with various teams to support Ooma's applications.
Responsibilities:
- Become a subject matter expert in applications supporting Ooma customers
- Collaborate with Development, QA and other SREs to evaluate, deploy, and debug applications
- Improve observability by implementing, refining, and adjusting application monitoring and thresholds
- Mentor team members to enhance application management practices
- Act as an escalation path and backup for junior team members, providing guidance during alerts and incidents
- Write automation scripts, set up CI pipelines, and review/evaluate software solutions and best practices
- Participate in on-call rotations, providing 24/7 support for Ooma services
Requirements:
- Strong background in production (24/7) support for large-scale environments required
- 6+ years of Linux administration and troubleshooting experience with full-stack, application support focus; 8+ years overall working experience as an IT professional
- Proven expertise in advanced scripting using Python, Perl, and Bash
- Database administration experience with MySQL, MongoDB, or PostgreSQL required
- Must have experience with configuration management tools such as Ansible, Puppet, etc
- Hands-on experience with cloud platforms such as OCI, AWS, or GCP required
- Proven ability to lead technical projects from inception to completion
- Strong collaboration skills and empathy for the end-user experience
- Excellent troubleshooting, communication (written and verbal), and cross-functional leadership skills
- Ability to work effectively in fast-paced, dynamic environments and manage ambiguity
- Quick learner with a self-starter mindset and ownership of outcomes
- Sound judgment in escalation and decision-making
- Bachelor's degree in Engineering/Computer Science or equivalent experience
- Experience with DevOps tools, like Docker, K8s, Gitlab CICD, Jenkins, Terraform preferred
- Experience with monitoring best practices using ELK stack, Prometheus, Nagios, or Grafana preferred
- Experience with Agile tools like Jira, Confluence, or any similar tool, preferred