NinjaOne is passionate about building unified IT solutions that simplify the way IT organizations work. They are currently looking for a Senior Site Reliability Engineer to join their SRE team and help scale their products to millions of end-users by focusing on automation and observability.

Responsibilities:

Diagnose and resolve complex application and infrastructure issues
Participate in our 24x7 on-call rotation, SCRUM, and deployment planning
Perform Root Cause Analysis (RCA) and provide recommendations for application teams
Improve availability and reduce customer impact using Industry best observability tools
Ensure best-practice and security-minded architecture by influencing design decisions
Create and maintain technical documentation and SOP’s
Develop software, scripts, or tooling to improve efficiency and reduce delivery time of applications and infrastructure
Other duties as needed

Requirements:

10+ years' experience in DevOps and/or Site Reliability Engineering roles
3+ years' experience with an object-oriented language (preferably Java, .NET or C++)
Intermediate+ level Linux administration, scripting, and troubleshooting
Demonstrable knowledge of Observability tools (New Relic, Splunk, DataDog)
Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc)
Experience with cloud automation and infrastructure-as-code (IaC) toolsets, primarily CloudFormation but also including Terraform, Helm and Ansible. CDK a plus
Good understanding of containers, Fargate, Kubernetes, and overall distributed microservice architectures
Passionate about automation, security, and self-service environments/portals
Hands-on experience with CI/CD and SDLC (Software Development Life Cycle) processes
Effective communication skills, both verbal and written

Senior Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: