TwinStream is a company focused on delivering technical excellence and high-quality service to clients, particularly in government organizations. They are seeking an experienced Site Reliability Engineer to ensure the availability, performance, and cost-effectiveness of their services while collaborating with development teams and improving infrastructure and delivery pipelines.

Responsibilities:

Collaborate with Software Engineers to improve reliability and performance in their subsystems
Partner with System Administrators in automating toil and eliminating alerts
Evolve observability and monitoring capabilities to identify and solve problems before they impact the business
Support development environments to help us achieve our delivery and quality goals
Research and evaluate technologies, tools and services to influence buy-vs-build decisions
Develop expertise in diverse technical and business domains
Expand your knowledge of the technical stacks used

Requirements:

Experience using AWS
Experience using modern configuration management tools (such as Ansible, Chef or similar)
Experience working with Terraform
Experience working with docker containers & container orchestration tools (such as Kubernetes, OpenShift or Docker Swarm)
Experience both using and maintaining CI / CD tools (such as Jenkins or similar)
Experience with monitoring tools such as InfluxDB, Prometheus or Grafana
Experience of event-driven integration with MQ messaging (RabbitMQ or similar AMQP solution)
Good understanding of relational databases and SQL
Linux command line, administration and shell scripting
Working knowledge of network security protocols
Experience using, developing with and maintaining cloud hosting services (ideally AWS EC2, RDS, S3, Lambda)
Industry experience writing well-tested code in one of our platform languages (Java, Go, Python or similar)
Knowledge of cross-domain principles & technologies
Experience of working in a service management environment
Practical applications of using observability patterns in previous systems
Creating and monitoring system availability metrics and using those to drive work that reduces downtime

Site Reliability Engineer (Contract outside of IR35)

Key skills

About this role

Responsibilities:

Requirements: