BMO is the Bank of Montreal, dedicated to creating lasting positive change for customers and communities. They are seeking a Site Reliability Engineer to design and manage the deployment, configuration, and monitoring of services in production, ensuring reliability and efficiency through automation and collaboration between development and operations teams.
Responsibilities:
- Deploys, configures, and monitors code as well as the availability, latency, change management, emergency response, and management capacity of services in production
- Helps the development and operations teams establish Service level indicators (SLIs), Service level objectives (SLOs) and Error budgets
- Performs automation to increase efficiency and decrease risk like log analysis, performance tuning, patch application, testing of production settings, incident response, and post-mortem analysis
- Supports in system design consulting, platform management, and capacity planning
- Debugs production issues across services and levels of the technology stack
- Improves service health visibility by recording metrics, logs, and traces across all services in order to pinpoint the reasons of an incident
- Computes the cost of SLA breaches and assists management in calculating the impact of system reliability. Helps development and operations teams understand the cost of downtime
- Focus is primarily on business/group within BMO; may have broader, enterprise-wide focus
- Exercises judgment to identify, diagnose, and solve problems within given rules
- Works independently on a range of complex tasks, which may include unique situations
- Broader work or accountabilities may be assigned as needed
- Take measured risks while protecting the bank by applying our Risk Management Framework in the execution of your role, in line with our Risk Culture and within our approved Risk Appetite, making sound and risk informed decisions that align to business strategy, protect assets, and adhere to applicable policy documents (Frameworks, Policies, Standards, Procedures and Supporting documents), laws and regulations
Requirements:
- Foundational level of proficiency in DevOps
- Foundational level of proficiency in Cybersecurity and privacy concepts, principles and solutions
- Foundational level of proficiency in Emotional agility
- Foundational level of proficiency in IT infrastructure library
- Foundational level of proficiency in Robot Process Automation
- Foundational level of proficiency in Cloud Computing
- Foundational level of proficiency in Configuration Management
- Foundational level of proficiency in Container Orchestration
- Foundational level of proficiency in System Design and Implementation
- Foundational level of proficiency in Incident management
- Foundational level of proficiency in Learning Agility
- Foundational level of proficiency in Building and managing relationships
- Intermediate level of proficiency in API Management
- Intermediate level of proficiency in Automation and Automation Pipelines
- Intermediate level of proficiency in Automated Testing
- Intermediate level of proficiency in Quality Assurance and Control
- Intermediate level of proficiency in Verbal & written communication skills
- Intermediate level of proficiency in Collaboration & team skills
- Intermediate level of proficiency in Analytical and problem solving skills
- Intermediate level of proficiency in Data driven decision making
- Typically between 4 - 6 years of relevant experience and post-secondary degree in related field of study or an equivalent combination of education and experience
- Technical proficiency gained through education and/or business experience