Lead Ops / SRE teams to build / maintain “Infrastructures as Code”, software services (PaaS and SaaS), security policies and continuous integration / deployment processes
Remove technical debt, deliver security hardening, and drive continued optimization of cloud based environments
Maintain critical production services with a view to provide best possible uptime and a huge focus on reliability around tier 1 / mission critical 24/7 services
Work with diverse technical and non-technical teams, including Development, QA, IT Operations, Customer Operations and Project Management teams
Lead FinOps processes for continuous review and ongoing cost optimization, including maintaining and forecasting cloud OPEX spend
Ensure 24/7 technical support and Service Level Agreements for customers is met, with ultimate accountability for uptime and resiliency across a group of products
Drive automation to aid productivity, minimizing the amount of traditional operational effort and maximize Infrastructure as Code
Manage Customer Reliability Engineering activities driving Application Monitoring, Metrics, Incident Reviews and Long Term Actions, and support BISO activities / implementing InfoSec changes
Requirements
Experience leading SRE Team/s, including rotating staff across products / platforms, mentoring and development, objective setting and general line management
Experience building and operating “Infrastructure as Code” and other DevOps practices, including CI/CD processes
Experience maintaining and supporting critical, high revenue business applications, including diagnosis and resolution of complex system and application issues
Good understanding of defensive, corrective, detective controls and general application troubleshooting
Experience hosting mission critical apps within public cloud providers such as AWS and / or Azure through services such as EC2, ECS, AKS or ACA
Experience working with containerized workloads such as Docker and orchestration via Kubernetes
Experience with security based tooling such as Qualys, Wiz, Trufflehog, GitHub Advanced Security, etc
Ability to maintain systems / application documentation for technical and non-technical audiences, and collaborate across SRE Managers/Leads/Cloud Centre of Excellence to adopt best of breed practices
Tech Stack
AWS
Azure
Cloud
Docker
EC2
Kubernetes
Benefits
Comprehensive, multi-carrier program for medical, dental and vision benefits
401(k) with match and an Employee Share Purchase Plan
Wellness platform with incentives, Employee Assistance and Time-off Programs
Short-and-Long Term Disability, Life and Accidental Death Insurance, Critical Illness, and Hospital Indemnity
Family Benefits, including bonding and family care leaves, adoption and surrogacy benefits
Health Savings, Health Care, Dependent Care and Commuter Spending Accounts