AppOmni is a leader in SaaS security, focused on preventing data breaches through comprehensive visibility and monitoring solutions. The Senior Site Reliability Engineer (SRE) will ensure system reliability and performance, implement automation for deployment, and collaborate with development teams to optimize service-level objectives and incident response processes.
Responsibilities:
- Monitoring system availability
- Implementing automation for deployment and maintenance tasks
- Proactively identifying areas for optimization
- Collaborate with the development team to establish and refine service-level objectives
- Drive incident response and postmortem analysis to minimize service disruptions
Requirements:
- Excellent technical and non-technical communication skills
- Prior Experience as an SRE or related discipline responsible for maintaining high availability of a cloud based application, troubleshooting performance bottlenecks, configuring monitoring and alerting, and conducting incident response in a blameless environment
- A knack for reducing manual toil tasks with automation and systematic thinking
- Prior experience working with CI/CD tools and processes, pipelines-as-code (GitHub Actions, CircleCI)
- At least 5+ years of hands-on experience with Python or Golang
- A solid background in configuration management and infrastructure-as-code (Terraform)
- Solid experience in monitoring/observability systems (Grafana, Prometheus, etc.)
- Demonstrated knowledge with Container orchestration (Kubernetes/GKE)
- Experience managing Kubernetes platforms and resources, and using Kubernetes deployment tool and patterns (Helm, GitOps, Knative)
- Experience in FedRAMP or similar secure environments
- Expertise working within highly controlled environments containing sensitive information
- Experience designing and maintaining CI/CD pipelines using commercial solutions
- Experience working on and within GCP and/or AWS