FloSports is a world-class sports media company focused on providing essential coverage for passionate sports fans. The Staff Site Reliability Engineer will lead the technical architecture and execution of infrastructure migrations, ensuring reliability and enabling developers to ship features efficiently through automation and tooling.

Responsibilities:

Lead the technical architecture and execution of our landmark migration from a legacy GCP environment to a modern, scalable infrastructure on AWS EKS
Architect, design, and drive our core infrastructure, defining the patterns for Terraform and GitOps that the rest of the organization will follow
Champion and drive our SLO-driven culture, setting the strategy for how we define, measure, and implement SLOs for critical user journeys, guided by the four Golden Signals (Latency, Traffic, Errors, and Saturation)
Lead the design and development of critical tooling and automation in Node.js and Go to solve entire classes of problems for our developers
Lead the architectural evolution of our in-house, K6-based load testing platform, ensuring it can scale to meet future product demands
Act as a primary subject matter expert for our Istio service mesh, driving its architecture, adoption, and optimization
Spearhead and own high-priority initiatives, including the development of agentic workflows and intelligent automation for SRE domains like proactive scaling and automated remediation
Act as a technical leader by participating in our blameless on-call rotation, mentoring other engineers through complex incidents and ensuring all post-mortems lead to systemic, long-term improvements

Requirements:

Extensive Experience: 8-10+ years in SRE, DevOps, or Software Engineering, with a proven track record of operating at a Staff level
Proven Technical Leadership: You have a history of mentoring other senior engineers, influencing technical direction across multiple teams, and leading large-scale projects to completion
Expert Coder: You are a polyglot with deep expertise in languages like Node.js or Go and a history of building and maintaining critical automation and services
Kubernetes Architect: You have an expert-level, architectural understanding of Kubernetes (EKS preferred), including networking, custom controllers, and control plane optimization
Infrastructure as Code Expert: You are a Terraform expert who has designed and implemented large-scale, reusable, and secure IaC frameworks, not just consumed them
Observability Architect: You have designed and implemented observability strategies from the ground up, leveraging platforms like Datadog to create actionable SLOs and provide deep system insight
CI/CD Architect: You have designed, built, and scaled complex CI/CD systems (ideally with GitHub Actions and self-hosted runners) that are used by an entire engineering organization
Expert Systems Thinker: You can decompose highly ambiguous, complex, cross-functional problems into solvable parts and lead the technical solution from concept to production
Agentic Systems & Intelligent Automation: You have successfully designed and deployed agentic systems or other forms of intelligent automation to solve SRE problems and can speak to the tangible results
Architectural Leadership in a large-scale cloud migration (e.g., GCP to AWS)
Performance Testing: Deep experience building or scaling custom load testing frameworks, especially with K6
Istio Expertise: Deep, practical experience managing Istio in a large, multi-tenant production environment
Familiarity with serverless architectures, especially SST
Experience orchestrating the deprecation and removal of legacy configuration management systems

Staff Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: