Cayuse is a company focused on empowering organizations to conduct globally connected research through advanced technology and exceptional service. As a Senior Infrastructure Engineer, you will drive the reliability, scalability, and efficiency of the cloud-based infrastructure and SaaS products, while mentoring colleagues and contributing to the improvement of SRE practices.

Responsibilities:

Serve as a technical expert and mentor to other engineers, sharing knowledge and best practices
Lead by example, demonstrating strong technical proficiency in SRE principles and practices, specifically within the AWS ecosystem
Contribute to the development and implementation of SRE standards and guidelines, tailored to AWS best practices
Foster a culture of continuous learning and improvement within the team
Help others to grow their automation skillsets
Design, build, and maintain robust and scalable infrastructure using Terraform, leveraging AWS services effectively
Develop and optimize CI/CD pipelines using Bitbucket Pipelines, integrating seamlessly with AWS deployment strategies
Implement and maintain monitoring and logging solutions to ensure system observability, utilizing AWS monitoring tools
Automate infrastructure and operational tasks to reduce toil and improve efficiency, with a focus on AWS automation
Contribute to the development and maintenance of automation tools and scripts
Troubleshoot complex infrastructure and application issues within the AWS environment
Respond to on-call Sev 1 incidents, particularly those occurring during the Australian (AU) time zone, and participate in a 24/7 on-call rotation approximately once per month
Participate in incident response and root cause analysis, contributing to the resolution of critical issues on AWS
Define and monitor SLOs/SLAs to ensure system reliability, using AWS metrics and monitoring
Contribute to disaster recovery planning and testing, utilizing AWS disaster recovery capabilities
Analyze system performance and identify areas for improvement within AWS
Proactively find and resolve potential issues before they become incidents
Collaborate with development, operations, and other teams to ensure smooth and efficient operations on AWS
Contribute to code reviews and technical discussions
Identify and implement process improvements to enhance team efficiency and effectiveness
Document best practices and create knowledge-sharing resources
Participate in agile ceremonies

Requirements:

Deep experience with AWS, including core services like EC2, S3, RDS, Lambda, CloudWatch, EKS, and a solid understanding of AWS networking (VPC, Security Groups) and security fundamentals (IAM)
4+ years of experience working with public cloud technologies (AWS preferred)
4+ years of experience developing monitoring and log analysis tools, including proficiency with Grafana and New Relic
Deep understanding of Site Reliability Engineering (SRE) principles, platforms, and tools
Proven experience with Terraform and Bitbucket Pipelines
Strong understanding of CI/CD pipelines and SDLC
Experience with Docker and Kubernetes
Proficiency in scripting languages (bash, Python)
Experience implementing and managing security controls and tools
Understanding of security systems and best practices
Experience with git and code branching/merging strategies
Experience with Agile methodologies (Scrum, Kanban)
Strong problem-solving and troubleshooting skills
Excellent communication and collaboration skills
Passion for mentoring and sharing knowledge
Automation-first mindset
Ability to own medium to large technical projects
Candidates MUST reside in Australia to be considered

Senior Infrastructure Engineer

Key skills

About this role

Responsibilities:

Requirements: