FloSports is a world-class sports media company focused on providing essential coverage for passionate sports fans. The Staff Site Reliability Engineer will lead the technical architecture and execution of infrastructure migrations, ensuring reliability and enabling developers to ship features efficiently through automation and tooling.
Responsibilities:
- Lead the technical architecture and execution of our landmark migration from a legacy GCP environment to a modern, scalable infrastructure on AWS EKS
- Architect, design, and drive our core infrastructure, defining the patterns for Terraform and GitOps that the rest of the organization will follow
- Champion and drive our SLO-driven culture, setting the strategy for how we define, measure, and implement SLOs for critical user journeys, guided by the four Golden Signals (Latency, Traffic, Errors, and Saturation)
- Lead the design and development of critical tooling and automation in Node.js and Go to solve entire classes of problems for our developers
- Lead the architectural evolution of our in-house, K6-based load testing platform, ensuring it can scale to meet future product demands
- Act as a primary subject matter expert for our Istio service mesh, driving its architecture, adoption, and optimization
- Spearhead and own high-priority initiatives, including the development of agentic workflows and intelligent automation for SRE domains like proactive scaling and automated remediation
- Act as a technical leader by participating in our blameless on-call rotation, mentoring other engineers through complex incidents and ensuring all post-mortems lead to systemic, long-term improvements
Requirements:
- Extensive Experience: 8-10+ years in SRE, DevOps, or Software Engineering, with a proven track record of operating at a Staff level
- Proven Technical Leadership: You have a history of mentoring other senior engineers, influencing technical direction across multiple teams, and leading large-scale projects to completion
- Expert Coder: You are a polyglot with deep expertise in languages like Node.js or Go and a history of building and maintaining critical automation and services
- Kubernetes Architect: You have an expert-level, architectural understanding of Kubernetes (EKS preferred), including networking, custom controllers, and control plane optimization
- Infrastructure as Code Expert: You are a Terraform expert who has designed and implemented large-scale, reusable, and secure IaC frameworks, not just consumed them
- Observability Architect: You have designed and implemented observability strategies from the ground up, leveraging platforms like Datadog to create actionable SLOs and provide deep system insight
- CI/CD Architect: You have designed, built, and scaled complex CI/CD systems (ideally with GitHub Actions and self-hosted runners) that are used by an entire engineering organization
- Expert Systems Thinker: You can decompose highly ambiguous, complex, cross-functional problems into solvable parts and lead the technical solution from concept to production
- Agentic Systems & Intelligent Automation: You have successfully designed and deployed agentic systems or other forms of intelligent automation to solve SRE problems and can speak to the tangible results
- Architectural Leadership in a large-scale cloud migration (e.g., GCP to AWS)
- Performance Testing: Deep experience building or scaling custom load testing frameworks, especially with K6
- Istio Expertise: Deep, practical experience managing Istio in a large, multi-tenant production environment
- Familiarity with serverless architectures, especially SST
- Experience orchestrating the deprecation and removal of legacy configuration management systems