Lead the design, implementation, and ongoing improvement of reliable, scalable, performant, and secure production platforms and services.
Work closely with cross-functional teams to build and maintain resilient infrastructure and deployment patterns.
Provide technical leadership and mentorship to engineers across the organisation, promoting strong engineering standards and operational best practice.
Participate in a 24x7 on-call rotation to support critical services and ensure platform availability.
Drive standardisation, automation, and documentation to improve consistency, reduce operational overhead, and support knowledge sharing.
Contribute across the full lifecycle of platform and service delivery, from design and build through to operation and optimisation.
Requirements
5+ years of experience in DevOps, SRE, platform engineering, or software engineering roles.
Strong Kubernetes experience at scale, with deep understanding of containers and container orchestration.
Hands-on experience with infrastructure as code tools such as Terraform, Ansible, or Puppet.
Strong programming skills in at least one object-oriented language, along with effective scripting and automation capability.
Strong understanding of security principles and best practices across infrastructure, platforms, and services.
Significant hands-on experience in at least one major cloud platform, with broad exposure across AWS, GCP, or OCI.
Strong monitoring, alerting, and observability experience using tools such as Prometheus, Grafana, or similar platforms.
Solid understanding of networking fundamentals and distributed systems.
Strong Linux and/or Windows systems administration experience.
Experience with software delivery automation, CI/CD pipelines, and secure SDLC practices, including exposure to static and dynamic security testing.
Good understanding of SRE concepts such as SLIs, SLOs, SLAs, toil reduction, availability, and observability.
Experience managing and scaling Elasticsearch in production is strongly preferred.