Sezzle is a company on a mission to financially empower the next generation by revolutionizing the shopping experience. They are seeking a Principal Site Reliability Engineer to architect and build scalable infrastructure solutions while driving reliability and operational excellence across their systems.
Responsibilities:
- Architect, upgrade, design, and build scalable infrastructure solutions leveraging Kubernetes, AWS, RDS (MySQL/Postgres), and modern distributed patterns
- Help drive the infrastructure team’s roadmap, leading us to higher levels of reliability, recoverability, and scalability
- Drive capacity planning, benchmarking, and work with the team to stress test our systems, find bottlenecks, and prepare for further growth in the business
- Define, maintain and enforce SLAs and alerts across our infrastructure
- Lead the teams towards stronger signal anomaly detection, better, more flexible alerting
- Help Lead Sezzle’s AI enablement efforts, identifying opportunities to apply AI and automation to enhance infrastructure reliability, developer productivity, and internal tooling
- Build in consistency and scalability across a distributed microservices architecture while maintaining performance and reliability
- Establish and evolve engineering best practices for observability, security, and CI/CD across teams
- Mentor engineers and champion a culture of learning, innovation, and operational excellence
- Collaborate cross-functionally to translate business goals into technical roadmaps and deliver results that matter
Requirements:
- 15+ years of professional software engineering or infrastructure engineering experience, including significant SRE and backend experience
- Deployed significant changes to a production application or infrastructure configuration in the past 30 days
- Expertise with SQL-based RDBMS (MySQL, PostgreSQL) and experience optimizing schema and queries for performance at scale
- Proficiency in observability tools (Prometheus, Grafana, Datadog, New Relic)
- Solid understanding of distributed systems design patterns (e.g., transactional outbox, event-driven architecture and stream processing, queues)
- Demonstrated ability to bring new ideas forward, influence decisions, and lead complex technical initiatives
- Bachelor's degree in Computer Science or equivalent practical experience
- Experience with AWS cloud infrastructure, mainly AWS Aurora RDS, both MySQL and Postgres
- Experience with data engineering, data pipelines and data warehousing
- Experience with CI/CD pipelines and deploying containerized microservices in Kubernetes
- Familiarity with AI developer tooling like Claude Code, Gemini CLI, Codex, Cursor and using it to be a more productive engineer
- Strong proficiency in Golang, with experience building and maintaining RESTful APIs
- Track record of shipping commercial APIs and data-driven applications in high-growth environments
- Proven leadership in guiding technical direction, improving system reliability, and scaling high-traffic services