Hands-on Reliability & System Engineering: Design, build, and operate reliable and scalable systems by defining and monitoring SLOs/SLIs, working directly on production infrastructure, and collaborating closely with software engineers on system design and reliability improvements
Automation, Operations & Incident Response: Actively develop automation for infrastructure and operational workflows to eliminate toil and reduce MTTR, participate in and lead incident response, and drive blameless post-incident reviews with concrete follow-ups implemented in code and tooling
Performance, Capacity & Security: Continuously analyze and optimize system performance and cost, provide data, insights, and recommendations to inform capacity planning, and support security best practices through hands-on vulnerability remediation and threat mitigation

SRE & Cloud Engineering: Hands-on experience with SRE practices in production, strong AWS expertise, Kubernetes, networking, DNS, and Infrastructure as Code (Pulumi preferred, Terraform a plus)
Automation & Software Engineering: Demonstrate strong software engineering fundamentals with an emphasis on code quality and maintainability, including solid Python proficiency and deep knowledge of the Python ecosystem
Reliability, Data & Operations: Add stakeholder engagement and mentoring, e.g. lead incident response and RCAs, improve system reliability, and engage stakeholders to propose solutions, share learnings, and mentor others.

Work Your Way: Enjoy full flexibility – work from home, the office or a mix of both. Plus, work from anywhere for up to 30 days a year.
Grow with us: Get access to learning resources, mentorship and a growth plan tailored to you.
Thrive and perform: Enjoy private healthcare, gym discounts, wellbeing programs and mental health support.

DevOps Engineer

Key skills