Attain Finance is a leading consumer credit lender with over 50 years of expertise in providing credit solutions. They are seeking a strategic and people-first Site Reliability Engineering Manager to lead their SRE team in building scalable and resilient infrastructure while ensuring the empowerment of engineers and operational excellence.
Responsibilities:
- Lead and mentor a team of SREs, fostering growth, accountability, and cross-functional collaboration
- Drive incident management, postmortem analysis, and root cause remediation
- Architect scalable monitoring, alerting, and automation frameworks using tools like Grafana, Thousand Eyes, and Go Alert
- Configure and maintain storage alarms, ensuring proactive capacity management and system health
- Own the reliability roadmap: SLAs, SLOs, error budgets, and performance metrics
- Partner with Engineering, Infrastructure, IT Support, and Desktop Services to ensure seamless service delivery
- Manage and optimize JAMS batch workflows for change tracking, approvals, and operational transparency
- Champion infrastructure-as-code, CI/CD pipelines, and cloud-native reliability practices
- Oversee the Change Advisory Board (CAB) meetings, ensuring changes are reviewed, documented, and aligned with reliability goals
Requirements:
- 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles
- 2+ years of experience managing technical teams with a focus on mentorship and collaboration
- Strong proficiency in observability tools (Grafana, Thousand Eyes), and alerting systems (Go Alert)
- Experience configuring storage alarms in Grafana and CloudWatch. Familiarity with Azure Monitor
- Experienced in JAMS workload automation, including failure recovery and job management
- Deep understanding of distributed systems, service-level objectives, and incident response frameworks
- Demonstrated familiarity with Python, SQL, and PowerShell to support oversight of critical automation and tooling maintained by the SRE team
- Familiarity with ITSM platforms (Ivanti, ServiceNow) and device management tools (Intune, JAMF)
- Demonstrated experience contributing to or leading CAB processes, with a focus on change control, risk mitigation, and stakeholder communication
- Excellent communication skills—able to translate technical insights into executive-ready narratives
- Proven ability to drive cross-functional initiatives and foster a culture of reliability and ownership
- Proven track record in cost optimization and strategic standardization
- Experience managing hybrid cloud environments and global endpoint fleets
- Familiarity with endpoint telemetry and predictive failure forecasting
- Passion for building inclusive, high-performing teams and mentoring future leaders