Ensure the resilience and availability of Kohl’s systems and applications
Collaborate closely with development teams
Contribute to architectural designs
Conduct risk assessments and design for failure
Implement robust monitoring and failover mechanisms
Drive error budget and Service Level Objective (SLO) adoption across products
Drive incident response efforts, perform root cause analysis and implement preventative measures to enhance system reliability
Establish consistent practices that elevate Kohl’s operational excellence through automation and process improvements
Follow software lifecycle and drive reliability, observability, and efficiency across product teams within an assigned domain
Identify repeated toil and find opportunities for automation and risk reduction
On-call on a rotation to respond to production incidents and conduct blameless retros and root-cause analyses (RCAs) to drive a culture of continuous improvements
Proactively identify failures before they cause outages using chaos engineering techniques such as edge cases, failure modes and design review
Advise on capacity planning and provide continuous assessments on systems behavior and consumption
Work with product managers to identify and prioritize work for reliability best practices (i.e., leveraging SLIs/SLOs/Error Budgets)
Mentor and assist engineers on the team
Requirements
Bachelor's Degree or equivalent in MIS, Computer Science or related field
4+ years of experience in software development
Strong programming skills in one or more languages (Java, Python, Go or Node.js)
In-depth knowledge of systems architecture, operating system internals and network fundamentals
In-depth knowledge of application design patterns, event-driven architecture, database schemas, and testing strategies
Experience with multi-region application troubleshooting and performance tuning
Working experience with one cloud platform (GCP, AWS, or Azure)
Working experience with monitoring techniques and tools (e.g., CloudWatch, Grafana, Prometheus, OpenTelemetry, Tracing)