PrizePicks is the fastest-growing sports company in North America, recognized for its leading platform in Daily Fantasy Sports. The Senior Site Reliability Engineer will ensure the reliability, scalability, and performance of the infrastructure, while also leading incident response and mentoring other engineers.
Responsibilities:
- Design, implement, maintain, and monitor reliable production systems at scale
- Lead incident response, mitigate production issues, and conduct post mortem analysis
- Proactively monitor performance, analyze system failures, identify bottlenecks, and propose solutions
- Create and support observability/monitoring tools and vendor integrations
- Drive the growth of a reliability culture, promoting cross-functional collaboration towards improving system reliability, scalability, resilience, and security
- Train and mentor other engineers
Requirements:
- 5+ years of experience as a reliability-focused engineer in a fast-paced, rapidly growing, enterprise environment
- Deep understanding of tooling and application development in these areas: Cloud computing such as AWS, Azure, and/or GCP
- Infrastructure as code tools such as terraform or crossplane
- Developing applications in languages such as python, ruby, or go
- Deploying and supporting applications in Kubernetes at scale
- Implementing monitoring in tools like grafana, new relic, or datadog
- Experience debugging live, critical production issues
- Familiarity with reliability principles, such as resilient systems, application and supply chain security, and SLO governance
- Ability to work cross-functionally with diverse engineering teams
- Candidates based in Atlanta are preferred, but open to qualified applicants from anywhere in the U.S