We are looking for Cloud Site Reliability Engineer (SRE) for our client in Calgary, AB.
Job Title: Cloud Site Reliability Engineer (SRE)
Job Location: Calgary, AB
Job Type: Contract
Job Overview:
Requirement/Must Have:
- 1+ years of experience with Azure and AWS.
- Strong verbal and written communication skills.
- Excellent communication and presentation skills.
- Ability to communicate effectively with stakeholders.
Responsibilities:
- Design, implement, and maintain comprehensive monitoring and alerting systems such as Azure Monitor, AWS CloudWatch, Application Insights, and Log Analytics.
- Automate repetitive manual operations (toil) such as environment provisioning, system patching, and scaling. Use IaC tools like Terraform and Ansible to manage infrastructure.
- Actively manage incident responses, root cause analysis (RCA), and post-mortem investigations to improve system reliability and minimize mean time to resolution (MTTR).
- Deploy and configure Cloud SRE Agent to automate incident investigation, execute remediation steps (restart, scale, rollback), and manage routine tasks.
- Analyze usage patterns to optimize cloud resources, ensuring high availability and performance while managing costs via Azure Cost Management.
- Integrate automation workflows into CI/CD pipelines (e.g., GitHub Actions or Azure Pipelines) to ensure reliable deployments.
Skills:
- Azure.
- AWS.
- Terraform.
- Ansible.
- GitHub Actions.
- Azure Pipelines.