Rackspace Technology is a multicloud solutions expert that combines expertise with leading technologies to deliver end-to-end solutions. They are seeking a Site Reliability Engineer to work with customers on implementing observability solutions and maintaining scalable systems to improve performance and reliability.
Responsibilities:
- Work with customers and implement Observability solutions
- Build and maintain scalable systems and robust automation that supports engineering goals
- Develop and maintain monitoring tools, alerts, and dashboards to provide visibility into system health and performance
- Proactively gather and analyze both metric and log data from systems and applications to perform anomaly detection, performance tuning, capacity planning and fault isolation
- Collaborate with development teams to implement and deploy new features and enhancements, ensuring they meet reliability, security and performance standards
- Collaborate with team members to document and share solutions
- Maintain a deep understanding of the customer’s business as well as their technical environment
- Identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues
Requirements:
- Bachelor's degree in engineering/computer science or equivalent
- Senior-level experience with Site Reliability Engineering, DevOps, Code level application support and troubleshooting, AWS Infrastructure design, implementation and optimization, Automation for deployment, scaling and reliability
- Experience with observability solutions tools Splunk, Datadog, & SignalFx
- Experience deploying, maintaining and supporting software applications/services in the AWS ecosystem
- Proactive approach to identifying problems and solutions
- Experience writing code with one or more interpreted languages such as Python, PHP, Perl, Ruby, Linux Shell
- Experience with Terraform or Cloud Formation scripting
- Experience with configuration management tool Ansible
- Experience with Docker and Kubernetes
- Experience with standard software development best practices and tools such as code repositories (Git preferred)
- Experience executing in an agile software development environment
- Good understanding of pricing/cost models across AWS services, especially compute, storage, and database offerings
- A clear understanding of network & system Management solutions
- Excellent organizational and project management skills
- Excellent communication, critical thinking & analytical skills