About this role

Dataminr is a mission-driven company that focuses on providing real-time intelligence through its AI-powered platform. As a Site Reliability Engineer, you will build and maintain tools for software engineers and data scientists, ensuring high-quality software delivery and championing best practices within the engineering organization.

Responsibilities:

Work on our self service internal developer platform used by engineering teams to deploy containers, serverless functions and cloud resources
Maintain and improve our observability stack
Drive improvements in security, reliability, cost efficiency and performance
Troubleshoot large-scale distributed systems
Work closely with product engineering teams to enable efficient project delivery
Support our production environment as part of an on call rota, help with triage and resolution when issues arise

Requirements:

Experience managing Kubernetes clusters at scale (CKA a bonus)
Maintaining and hardening AWS infrastructure using Terraform
Development skills in Python or Go
Linux systems administration and TCP/IP networking
Experience maintaining observability tooling e.g. LGTM stack, OpenSearch

Site Realiability Engineer III, Platform Engineering

Key skills

About this role

Responsibilities:

Requirements: