Okta is The World’s Identity Company, providing secure access and authentication solutions. The Associate Site Reliability Engineer will design, build, and monitor production infrastructure while ensuring security and reliability through automation and best practices.
Responsibilities:
- Designing, building, running, and monitoring Okta's production infrastructure
- Be an evangelist for security best practices and also lead initiatives/projects to strengthen our security posture for critical infrastructure
- Responding to production incidents and determining how we can prevent them in the future
- Triaging and troubleshooting complex production issues to ensure reliability and performance
- Identifying and automating manual processes
- Continuously evolving our monitoring tools and platform
- Promoting and applying best practices for building scalable and reliable services across engineering
- Developing and maintaining technical documentation, runbooks, and procedures
- Supporting a 24x7 online environment as part of a global on-call rotation
Requirements:
- Are always willing to go the extra mile: see a problem, fix the problem
- Are passionate about encouraging the development of engineering peers and leading by example
- Have experience automating, securing, and running large-scale production Java/Tomcat and containerized services in AWS (EC2, ECS, KMS, Kinesis, RDS) or other cloud providers
- Knowledge of core Kubernetes concepts (Pods, Deployments, Services). Proven ability to use kubectl for basic operations and to diagnose issues by inspecting logs and pod status
- Have knowledge of CI/CD principles, Linux fundamentals, OS hardening, networking concepts, and Internet protocols
- Have an understanding and familiarity with configuration management tools like Chef, Terraform, and Ansible
- Have strong skills in operational tooling languages such as Ruby, Python, or Go
- Understand both relational and non-relational datastores, including replication and clustering strategies
- 1+ years of experience architecting and running complex AWS or other cloud networking infrastructure resources
- 1+ years of experience with Ansible, Chef, and Terraform
- Strong Linux understanding and experience
- BS In computer science (or equivalent experience)
- This position requires the ability to access federal environments and/or have access to protected federal data. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire