Kinaxis is a global leader in end-to-end supply chain management, dedicated to supporting customer excellence. The Senior Cloud Engineer in Site Reliability Engineering will contribute to the production ecosystem, ensuring service availability and performance while collaborating with engineering teams to implement secure and efficient systems.

Responsibilities:

Within the functions of cloud operations, the incumbent contributes to and supports the production eco-system with emphasis on Software-as-a-Service to customers. While working in collaboration with engineering teams to ensure systems are designed and implemented with operability, security, performance, and availability
The operations role will continue to evolve in leveraging infrastructure, software engineering principles and financial governance, to address operational challenges with a focus on increased automation
Deliver customer excellence, making sure that Kinaxis meets all SLA objectives
Apply software engineering principles to operational challenges with a focus on automation, self-healing and monitoring solutions
Manage the lifecycle of customer production systems; deploy, upgrade, configure, decommission
Triage service requests and incidents
Excel at overcoming operational challenges
Support workload migrations from our physical data centers to cloud environments
Participate in an on-call rotation: Investigate, resolve incidents provide root cause analysis relating to production systems
Lead team members in achieving professional excellence utilizing industry best practices
Actively leads and engages with the team to develop and grow the team’s skills through sharing experiences and results of research on the latest technology trends
Required to be on-site at customer’s office for 25% of the work time

Requirements:

Prior experience in an infrastructure engineering or site reliability engineering role
5+ years of experience deploying and supporting applications
5+ years of experience with managing public cloud platforms (both console and API) like GCP, Azure or AWS
5+ Experience developing in Ansible, Terraform, PowerShell, Bash and Python
Proficiency in English and Mandarin is required
Prior experience working in ITIL-based methodologies, including Incident and Change Management
Ability to work independently, and as part of a team
Practical experience with Applications (Windows, Linux)
Practical experience with Containers (Helm, Docker)
Practical experience with Orchestration/Automation (Kubernetes, Ansible, Terraform, Jenkins)
Practical experience with System monitoring and centralized logging (Datadog, Prometheus, ELK)
In-depth and proactive communication and documentation skills
Experience and understanding of windows active directory concepts would be a plus

Senior Cloud Engineer, Site Reliability Engineering

Key skills

About this role

Responsibilities:

Requirements: