Lead and execute large-scale OS modernization efforts, including migrations from RHEL7 to EL8/EL9 across approximately 1,700 systems and virtual machines.
Support configuration management transitions, including Chef to CINC and legacy package/configuration migration from yinst to RPM.
Build, maintain, and configure RPM packages to support infrastructure modernization and application migration efforts.
Develop, execute, and improve automated runbooks for OS upgrades, configuration changes, service onboarding, and production support.
Triage, own, and resolve complex production issues, including high-priority S-bugs and infrastructure-related incidents.
Harden CI/CD pipelines, observability frameworks, and rollout/rollback mechanisms for legacy-to-modern infrastructure transitions.
Partner closely with CloudTech SRE to provide follow-the-sun Tier-2 production support, including hands-on incident response and break/fix operations.
Onboard services to modern monitoring, logging, and observability stacks.
Support migrations from legacy monitoring tools such as Yamas to platforms such as Chronosphere, Prometheus, and Grafana.
Assist with log management and Splunk integration strategies.
Partner with application development teams during cloud cutovers, component migrations, and production readiness activities.
Automate repetitive operational tasks using Python and related tooling.
Document technical procedures, runbooks, migration steps, and operational standards.
Requirements
5+ years of professional software engineering, production engineering, SRE, DevOps, or infrastructure engineering experience.
Strong hands-on experience with Python for automation, tooling, scripting, and operational workflows.
Experience supporting Linux infrastructure in production environments, ideally including RHEL7, EL8, and EL9.
Experience with OS modernization, infrastructure migration, or large-scale systems upgrade initiatives.
Hands-on experience with package management and build processes, preferably including RPM packaging.
Experience with configuration management tools such as Chef, CINC, Ansible, Puppet, or similar platforms.
Strong understanding of production support, incident response, break/fix workflows, and Tier-2 operational support.
Experience hardening CI/CD pipelines and supporting safe rollout/rollback processes.
Familiarity with observability, monitoring, logging, and alerting frameworks.
Ability to work independently, manage technical tasks, and communicate clearly with engineering and stakeholder teams.
Strong documentation skills and the ability to create repeatable runbooks and operational procedures.