Role Overview
What you'll do:
Infrastructure & Automation
- Operate and improve our cloud infrastructure to ensure systems remain stable, scalable and secure as usage grows.
- Strengthen environment consistency and deployment safety through improved configuration and automation.
- Reduce operational toil by automating repetitive processes and improving tooling.
Observability & Monitoring
- Build and refine monitoring, alerting and logging to detect issues early and reduce customer impact.
- Improve dashboards and production visibility for Engineering squads.
- Raise the bar for observability before services reach production.
Production & Incident Management
- Participate in on-call and respond to incidents in a structured, calm manner.
- Lead lower-complexity incidents end-to-end and support higher-impact events.
- Contribute to post-incident reviews and implement systemic improvements.
Reliability, Resilience & Risk
- Contribute to improving service reliability targets and reducing repeat incidents.
- Support capacity planning, performance optimisation and disaster recovery readiness.
- Identify operational and security risks and contribute to preventative controls.
Requirements
**About You **
You’re a pragmatic, systems-minded engineer who stays calm under pressure and takes ownership of keeping production environments stable, secure and continuously improving.
You bring:
- 3-4+ years’ experience in Site Reliability, Platform Engineering, DevOps or similar roles, with a strong focus on production systems and operational excellence.
- Experience supporting live production environments, including participation in on-call rotations and incident response. You understand what it means to own systems that customers rely on daily.
- Confidence debugging and resolving issues under pressure, using structured problem-solving to diagnose root causes and restore service quickly.
- Experience working with cloud infrastructure (e.g. AWS or similar), including managing environments that support scalable, customer-facing applications.
- Familiarity with containerised environments and orchestration tools, and how they impact deployment, scaling and service reliability.
- Experience contributing to infrastructure management and automation, helping create consistent, repeatable environments.
- Familiarity with monitoring and alerting platforms, and an understanding of how strong observability improves reliability outcomes.
- Scripting or automation capability, with the ability to reduce manual processes and improve operational efficiency.
Tech Stack
Benefits
What’s in it for you?
You’ll join a purpose-driven company at a genuinely exciting stage of growth, with the opportunity to make a real impact on education at scale.
What we offer:
- A hybrid working environment, with teams spending three days a week in our Melbourne office.
- Learning and development opportunities, including a dedicated PD budget.
- 24/7 access to our Employee Assistance Program (EAP), including face-to-face, phone and live chat support.
- A parental leave program for both primary and secondary carers.
- A supportive, inclusive culture where your voice is valued and heard.
- The chance to grow alongside a fast-moving, ambitious organisation.