Netskope is a market-leading cloud security company focused on redefining Cloud, Network and Data Security. The role of Staff Site Reliability Engineer involves improving reliability, availability, and performance of engineering stacks, while developing software solutions to solve operational problems and enhancing monitoring and alerting systems.
Responsibilities:
- Partner closely with service owners and engineers to develop reliable services driven by best practices
- Develop software and tools to solve a variety of problems across service and infrastructure
- Set up and manage monitoring, logging, and alerting systems for extensive training runs and client-facing APIs
- Ensure training environments are consistently available and prepared across multiple clusters
- Develop and manage containerization and orchestration systems utilizing tools such as Docker and Kubernetes
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
- Provide primary operational support and engineering for multiple large-scale distributed software applications
Requirements:
- Someone who works with a sense of ownership
- Takes pride in building and operating scalable, reliable, secure systems
- Are comfortable with ambiguity and change
- You have a knack for troubleshooting complex systems and enjoy solving challenging problems
- Proactive in identifying problems, performance bottlenecks, and areas for improvement
- Has experience in working and collaborating with teams based across different geographies and time zones
- Software programming experience in any programming language
- Good understanding of principles of distributed systems
- Deep understanding of Kubernetes and Docker
- Understanding of data technologies like Kafka, Yugabyte, Redis etc
- Good understanding of AWS ecosystem
- Basic understanding of networking
- Exposure to Infrastructure as code tools like Terraform
- Familiar with monitoring tools such as Prometheus, Grafana, or similar
- 8+ years building core infrastructure