Design, build, and operate reliable and scalable systems by defining and monitoring SLOs/SLIs.
Actively develop automation for infrastructure and operational workflows to eliminate toil and reduce MTTR.
Continuously analyze and optimize system performance and cost, provide data, insights, and recommendations to inform capacity planning, and support security best practices through hands-on vulnerability remediation and threat mitigation.
Requirements
SRE & Cloud Engineering: Hands-on experience with SRE practices in production, strong AWS expertise, Kubernetes, networking, DNS, and Infrastructure as Code (Pulumi preferred, Terraform a plus)
Automation & Software Engineering: Demonstrate strong software engineering fundamentals with an emphasis on code quality and maintainability.
This includes solid Python proficiency and deep knowledge of the Python ecosystem (testing, debugging, packaging) and a consistent focus on writing clean, well-structured, and maintainable code
Reliability, Data & Operations: add stakeholder engagement and mentoring e.g. lead incident response and RCAs, improve system reliability, and engage stakeholders to propose solutions, share learnings, and mentor others.
Tech Stack
AWS
Cloud
DNS
Kubernetes
Python
Terraform
Benefits
Work Your Way: Enjoy full flexibility – work from home, the office or a mix of both.
Work from anywhere for up to 30 days a year.
Access to learning resources, mentorship and a growth plan tailored to you.
Enjoy private healthcare, gym discounts, wellbeing programs and mental health support.