Dave is a financial app on a mission to build products that level the financial playing field. The Staff Site Reliability Engineer will serve as a technical anchor across cloud infrastructure and networking, focusing on reliability, automation, and performance within the platform.
Responsibilities:
- Lead architecture and automation across our GCP environment, ensuring reliability, scalability, security, and thoughtful cost management
- Define and improve SLIs, SLOs, and error budgets using Cloud Monitoring and Datadog — connecting reliability goals to real business outcomes
- Shape our multi-region, disaster recovery, and capacity planning strategies so the platform holds up as we grow
- Design and optimize cloud networking, including VPC architecture, ingress/egress, Cloud Armor, VPN, and DNS to support internal systems, partner integrations, and member-facing services
- Drive infrastructure-as-code and GitOps practices using Terraform, Kubernetes, Helm, and ArgoCD to make deployments predictable and repeatable
- Mentor SREs and infrastructure engineers through design reviews, incident retros, and hands-on collaboration — strengthening technical depth across the team
- Explore practical LLM-driven automation where it meaningfully reduces operational toil and shortens incident resolution time
Requirements:
- 8+ years in software, infrastructure, or site reliability engineering
- 5+ years of hands-on experience operating production systems in GCP (compute, networking, storage, IAM, observability)
- Deep experience with Kubernetes (GKE), Helm, containerization, Terraform (IaC), and ArgoCD
- Strong programming skills in Python, Go, or TypeScript/JavaScript for automation and internal tooling
- Experience defining and operating against SLIs, SLOs, and error budgets
- Strong knowledge of relational and distributed databases (e.g., MySQL, Cloud SQL, Cloud Spanner, Redis), including performance tuning and HA strategies
- Experience leading incident response, root cause analysis, and systemic remediation
- Experience in fintech or regulated environments
- Familiarity with CI tooling (GHA, Jenkins, Tekton, CircleCI)
- Experience in high-growth startups