Lead and mentor a team of reliability engineers to drive operational excellence across Kohl’s distributed systems
Develop and implement strategies, collaborate closely with engineering teams and ensure SRE best practices are embedded throughout the software development lifecycle
Conduct design reviews, implement robust monitoring and alerting and establish auto-healing practices
Provide leadership and guidance during critical incidents to triage, troubleshoot and resolve complex issues
Drive comprehensive root cause analysis and follow-through on preventative measures
Manage the software lifecycle, driving reliability, observability and efficiency in collaboration with peers across Design, Product Management, and Engineering
Lead major automation and toil reduction initiatives, simplifying the ecosystem and reducing risks
Set the vision and drive cultural transformation within the team
Coach team through empathy and hands-on mentoring
Develop and deliver training programs to upskill the team and broaden SRE adoption across the organization
Requirements
Bachelor's Degree or equivalent in MIS, Computer Science or related field
6+ years of experience in software development and 2+ years of progressive leadership experience, mentoring diverse teams
Advanced in-depth knowledge of application design patterns, event-driven architecture, database schemas and testing strategies
Demonstrated knowledge of systems architecture, operating system internals and networking
Proven experience with multi-region application troubleshooting and performance tuning
Demonstrated experience working with (at least one) cloud platform (GCP, AWS, or Azure) and a hybrid cloud environments
Advanced in-depth knowledge and experience with continuous integration, continuous deployment and test-driven development
Strong programming skills in one or more languages (Java, Python, Go or Node.js)