Kraken is a mission-focused company rooted in crypto values, aiming to accelerate the global adoption of crypto. As a Senior Site Reliability Engineer, you will ensure the reliability, scalability, and performance of systems that support Kraken's growth initiatives, collaborating with development teams and managing infrastructure.

Responsibilities:

Manage and support infrastructure for Growth teams, including Nomad, Hashistack, databases, and any other underlying systems
Maintain and troubleshoot GitLab CI pipelines, ensuring reliable and fast build, test, and deployment cycles
Provide operational support across Onboarding, Acquire, and Engage teams, helping debug issues in staging and production environments
Participate in incident response and post-incident reviews to improve system resilience
Consult with teams on performance, monitoring, and alerting best practices
Build tooling, automation, and dashboards to improve observability and empower development teams
Collaborate with developers, QA, and product managers to streamline development and release cycles
Support a fully distributed team operating across multiple timezones

Requirements:

5+ years in a DevOps or SRE role
Strong experience managing infrastructure with Consul, Vault, and Terraform
Proficiency with databases (SQL and NoSQL) and experience operating them in production
Proficient in Git source version-control and CI/CD configuration
Deep understanding of monitoring and alerting systems, preferably Prometheus and Grafana
Ability to debug complex issues involving distributed systems, networks, and Linux operating systems
Experience with containerization and orchestration (Docker, Nomad, Kubernetes a plus)
Strong scripting skills (e.g., Bash, Python, or Go)
Self-starter with the ability to thrive while working independently and remotely in a fast-paced environment
Ability to collaborate effectively with multiple teams and switch context across projects
Interest in security and consideration of the security implications of development and operational decisions
Experience with benchmarking, performance tuning, and identifying system bottlenecks
Familiarity with incident management best practices and tooling
Interest in lower-level programming languages such as Rust
Experience integrating with APIs (GitLab, Jira, Slack)
Background working with distributed systems and technologies (Kafka, gRPC, Redis, etc.)
Passion for building reliable, user-facing systems that scale

Senior Site Reliability Engineer - Growth

Key skills

About this role

Responsibilities:

Requirements: