Role Overview

What you'll do:

Infrastructure & Automation

Operate and improve our cloud infrastructure to ensure systems remain stable, scalable and secure as usage grows.
Strengthen environment consistency and deployment safety through improved configuration and automation.
Reduce operational toil by automating repetitive processes and improving tooling.

Observability & Monitoring

Build and refine monitoring, alerting and logging to detect issues early and reduce customer impact.
Improve dashboards and production visibility for Engineering squads.
Raise the bar for observability before services reach production.

Production & Incident Management

Participate in on-call and respond to incidents in a structured, calm manner.
Lead lower-complexity incidents end-to-end and support higher-impact events.
Contribute to post-incident reviews and implement systemic improvements.

Reliability, Resilience & Risk

Contribute to improving service reliability targets and reducing repeat incidents.
Support capacity planning, performance optimisation and disaster recovery readiness.
Identify operational and security risks and contribute to preventative controls.

Requirements

**About You **

You’re a pragmatic, systems-minded engineer who stays calm under pressure and takes ownership of keeping production environments stable, secure and continuously improving.

You bring:

3-4+ years’ experience in Site Reliability, Platform Engineering, DevOps or similar roles, with a strong focus on production systems and operational excellence.
Experience supporting live production environments, including participation in on-call rotations and incident response. You understand what it means to own systems that customers rely on daily.
Confidence debugging and resolving issues under pressure, using structured problem-solving to diagnose root causes and restore service quickly.
Experience working with cloud infrastructure (e.g. AWS or similar), including managing environments that support scalable, customer-facing applications.
Familiarity with containerised environments and orchestration tools, and how they impact deployment, scaling and service reliability.
Experience contributing to infrastructure management and automation, helping create consistent, repeatable environments.
Familiarity with monitoring and alerting platforms, and an understanding of how strong observability improves reliability outcomes.
Scripting or automation capability, with the ability to reduce manual processes and improve operational efficiency.

Tech Stack

AWS
Cloud

Benefits

What’s in it for you?

You’ll join a purpose-driven company at a genuinely exciting stage of growth, with the opportunity to make a real impact on education at scale.

What we offer:

A hybrid working environment, with teams spending three days a week in our Melbourne office.
Learning and development opportunities, including a dedicated PD budget.
24/7 access to our Employee Assistance Program (EAP), including face-to-face, phone and live chat support.
A parental leave program for both primary and secondary carers.
A supportive, inclusive culture where your voice is valued and heard.
The chance to grow alongside a fast-moving, ambitious organisation.

Site Reliability Engineer

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits