AppOmni is a leader in SaaS security, focused on preventing data breaches through comprehensive visibility and monitoring solutions. The Senior Site Reliability Engineer (SRE) will ensure system reliability and performance, implement automation for deployment, and collaborate with development teams to optimize service-level objectives and incident response processes.

Responsibilities:

Monitoring system availability
Implementing automation for deployment and maintenance tasks
Proactively identifying areas for optimization
Collaborate with the development team to establish and refine service-level objectives
Drive incident response and postmortem analysis to minimize service disruptions

Requirements:

Excellent technical and non-technical communication skills
Prior Experience as an SRE or related discipline responsible for maintaining high availability of a cloud based application, troubleshooting performance bottlenecks, configuring monitoring and alerting, and conducting incident response in a blameless environment
A knack for reducing manual toil tasks with automation and systematic thinking
Prior experience working with CI/CD tools and processes, pipelines-as-code (GitHub Actions, CircleCI)
At least 5+ years of hands-on experience with Python or Golang
A solid background in configuration management and infrastructure-as-code (Terraform)
Solid experience in monitoring/observability systems (Grafana, Prometheus, etc.)
Demonstrated knowledge with Container orchestration (Kubernetes/GKE)
Experience managing Kubernetes platforms and resources, and using Kubernetes deployment tool and patterns (Helm, GitOps, Knative)
Experience in FedRAMP or similar secure environments
Expertise working within highly controlled environments containing sensitive information
Experience designing and maintaining CI/CD pipelines using commercial solutions
Experience working on and within GCP and/or AWS

Senior Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: