Kyndryl is a company that designs, builds, manages, and modernizes mission-critical technology systems. They are seeking a Site Reliability Engineer to ensure reliability and innovation in their information systems while driving continuous improvement and delivering exceptional service to customers.

Responsibilities:

Ensure reliability, resiliency, and innovation in our information systems and ecosystems
Analyze business needs, tackle complex problems, and provide strategic advice and designs
Be involved in every stage of the software lifecycle, from building and testing to deploying changes and maintaining robust systems
Build trusted relationships with customers and partner with them for success
Work on end-to-end services, spanning customer sites and platforms
Collaborate and proactively work alongside a talented team of professionals
Take ownership of responsibilities and constantly seek innovative solutions
Implement cutting-edge tools that enhance operations, improve reliability, and gather valuable feedback on platforms
Identify and mitigate common operational issues to deliver seamless experiences to customers

Requirements:

10+ years of experience in operational management, including incident management and escalations
Experience with design and implementation of application monitoring to ensure reliability and performance meets or exceeds business goals
Experience implementing strategies to cap operations load and to handle overflow using appropriate tooling and metrics; defining service level indicators and objectives in collaboration with stakeholders, business, development, DevSecOps and Operations teams
Solution and design experience in an enterprise environment: Windows server, Linux server (RHEL is preferred), UNIX (AIX, Solaris), Windows server, storage, and Hyperscaler Cloud (AWS, Azure, Google Cloud Platform); public cloud platforms such as AWS, OpenShift, Azure or GCP
Experience working with Data format and Scripting languages JSON, YAML, Bash and/or PowerShell
BS degree in Computer Science, Engineering, or other highly technical, scientific discipline
Expertise with Ansible, Terraform, and Python
Experience with distributed technologies as well as dynamic resource management frameworks such as Kubernetes
Expertise in leveraging open-source tooling such as Prometheus, Grafana, or Loki

Site Reliability Engineer DWS Ohio

Key skills

About this role

Responsibilities:

Requirements: