Platform Management: Take ownership of the management and ongoing maintenance of our Kubernetes platforms.
Cross-Team Collaboration: Work closely with multiple development squads to identify requirements for building and deploying complex applications.
AWS Automation: Design and implement the automated deployment and configuration of cloud-application workloads within AWS.
Observability: Develop, advise on, and troubleshoot monitoring and management frameworks for cloud-resident workloads.
Consultancy: Assist engineering teams in architecting manageable, scalable, and highly observable cloud-native applications.
Incident Resolution: Troubleshoot and resolve environment and configuration issues. Analyse logs and monitoring data to proactively communicate potential risks to stakeholders.
Environment Orchestration: Manage and communicate the state, versioning, and availability of various environments to Developers, QA, and wider team members.
Tooling Excellence: Maintain and optimise shared development tools for branching, tagging, building, releasing, and reporting.
Willingness to provide on-call support on weekend (once in a month)
Requirements
Containerisation: Practical experience with Kubernetes (EKS), Docker, or Rancher.
Technical Background: Proficiency in Python, Bash, or similar languages to automate repetitive tasks.
Linux Mastery: Proficiency with Linux distributions (e.g., Ubuntu, CentOS).
CI/CD: A deep understanding of continuous integration and continuous delivery principles.
AWS Expertise: A broad knowledge of AWS services, specifically CloudFormation, EC2, ECS, and EKS.
Version Control: Expert use of Git and GitHub within a distributed team environment.
Monitoring & Alerting: Strong understanding of monitoring and alerting using tools like PagerDuty, Zabbix, and LogicMonitor for system health visibility, alerting, and incident response.