Kinaxis is a global leader in end-to-end supply chain management, dedicated to supporting customer excellence. The Senior Cloud Engineer in Site Reliability Engineering will contribute to the production ecosystem, ensuring service availability and performance while collaborating with engineering teams to implement secure and efficient systems.
Responsibilities:
- Within the functions of cloud operations, the incumbent contributes to and supports the production eco-system with emphasis on Software-as-a-Service to customers. While working in collaboration with engineering teams to ensure systems are designed and implemented with operability, security, performance, and availability
- The operations role will continue to evolve in leveraging infrastructure, software engineering principles and financial governance, to address operational challenges with a focus on increased automation
- Deliver customer excellence, making sure that Kinaxis meets all SLA objectives
- Apply software engineering principles to operational challenges with a focus on automation, self-healing and monitoring solutions
- Manage the lifecycle of customer production systems; deploy, upgrade, configure, decommission
- Triage service requests and incidents
- Excel at overcoming operational challenges
- Support workload migrations from our physical data centers to cloud environments
- Participate in an on-call rotation: Investigate, resolve incidents provide root cause analysis relating to production systems
- Lead team members in achieving professional excellence utilizing industry best practices
- Actively leads and engages with the team to develop and grow the team’s skills through sharing experiences and results of research on the latest technology trends
- Required to be on-site at customer’s office for 25% of the work time
Requirements:
- Prior experience in an infrastructure engineering or site reliability engineering role
- 5+ years of experience deploying and supporting applications
- 5+ years of experience with managing public cloud platforms (both console and API) like GCP, Azure or AWS
- 5+ Experience developing in Ansible, Terraform, PowerShell, Bash and Python
- Proficiency in English and Mandarin is required
- Prior experience working in ITIL-based methodologies, including Incident and Change Management
- Ability to work independently, and as part of a team
- Practical experience with Applications (Windows, Linux)
- Practical experience with Containers (Helm, Docker)
- Practical experience with Orchestration/Automation (Kubernetes, Ansible, Terraform, Jenkins)
- Practical experience with System monitoring and centralized logging (Datadog, Prometheus, ELK)
- In-depth and proactive communication and documentation skills
- Experience and understanding of windows active directory concepts would be a plus