Kraken is a technology company focused on creating a smart, sustainable energy system. As a Lead Site Reliability Engineer, you'll ensure the availability, performance, and scalability of products on the platform while leading a technical team to support millions of customers.

Responsibilities:

Team leadership
Have ownership of the Product Reliability team within Platform, working closely with the Director and Heads of Platform Engineering to define strategic objectives and team direction
Manage team priorities and ensure initiatives are completed within deadlines
Collaborate regularly and effectively with the Staff Platform Engineer in your functional team to deliver the technical implementation of the team’s strategic priorities
Lead delivery of major initiatives on clear timelines
Partner effectively in the wider Platform Engineering team to deliver outcomes
Build a strong culture of open communication where teammates can ask questions without fear, promoting a positive and inclusive team environment
People management
Line-manage the engineers in the Product Reliability team
Set clear performance expectations and goals for team members
Regularly review individual and team performance, offering actionable insights and constructive feedback to support and grow team members
Technical delivery
Deliver technical improvements such as small features and bug fixes
Support team delivery through code reviews, technology research and architectural guidance
Provide support for service offerings owned by your team
Help solve interesting and difficult problems. There’s a great opportunity for disruption in the global energy market

Requirements:

Excellent communication skills, working effectively with developers, product managers and other business stakeholders to understand and deliver impactful projects and reliability improvements
Record of successfully and consistently delivering critical path projects, on time and at scale
Meticulous organisation and planning skills
Experience of mentoring and coaching a team to perform at a high-level of quality
Experience managing and supporting a large-scale internet-facing distributed systems, for millions of customers
Good experience with AWS and a programming language. We use a lot of different AWS services and not just the standard few
Knowledge of security best-practices, security and CI/CD tooling, and methodologies
Previous experience in leading technical delivery for small, highly-autonomous teams
Previous experience as a technical individual contributor, preferably as a Site Reliability Engineer
Track-record of effective collaboration with other teams and departments to drive holistic outcomes
A proactive, innovative mindset with the ability to drive continuous improvement
Previous experience working in a remote-first asynchronous global team
Familiarity with some of our tech stack: PostgreSQL, or a similar RDBMS, particularly in Amazon RDS at scale
Docker and Kubernetes, we use Amazon EKS in production
Python
Datadog, or a similar logging/monitoring tool
Messaging queues, event-driven async processing or similar technologies - we use RabbitMQ
Terraform, or a similar infrastructure-as-code tool
Experience with a Linux distribution

Lead Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: