Dataminr is a mission-driven company that provides AI-powered intelligence solutions. As a Site Reliability Engineer, you will ensure high-quality software delivery by building and maintaining tools for software engineers and data scientists, while also championing best practices within the engineering organization.

Responsibilities:

Work on our self service internal developer platform used by engineering teams to deploy containers, serverless functions and cloud resources
Maintain and improve our observability stack
Drive improvements in security, reliability, cost efficiency and performance
Troubleshoot large-scale distributed systems
Work closely with product engineering teams to enable efficient project delivery
Support our production environment as part of an on call rota, help with triage and resolution when issues arise

Requirements:

Experience managing Kubernetes clusters at scale (CKA a bonus)
Maintaining and hardening AWS infrastructure using Terraform
Development skills in Python or Go
Linux systems administration and TCP/IP networking
Experience maintaining observability tooling e.g. LGTM stack, OpenSearch

Site Reliability Engineer III, Platform Engineering

Key skills

About this role

Responsibilities:

Requirements: