Dave is a financial app on a mission to build products that level the financial playing field. The Staff Site Reliability Engineer will be responsible for leading architecture and automation in the GCP environment, ensuring reliability, scalability, and security while mentoring the SRE team.
Responsibilities:
- Lead architecture and automation across our GCP environment, ensuring reliability, scalability, security, and thoughtful cost management
- Define and improve SLIs, SLOs, and error budgets using Cloud Monitoring and Datadog — connecting reliability goals to real business outcomes
- Shape our multi-region, disaster recovery, and capacity planning strategies so the platform holds up as we grow
- Design and optimize cloud networking, including VPC architecture, ingress/egress, Cloud Armor, VPN, and DNS to support internal systems, partner integrations, and member-facing services
- Drive infrastructure-as-code and GitOps practices using Terraform, Kubernetes, Helm, and ArgoCD to make deployments predictable and repeatable
- Mentor SREs and infrastructure engineers through design reviews, incident retros, and hands-on collaboration — strengthening technical depth across the team
- Explore practical LLM-driven automation where it meaningfully reduces operational toil and shortens incident resolution time
Requirements:
- 8+ years in software, infrastructure, or site reliability engineering
- 5+ years of hands-on experience operating production systems in GCP (compute, networking, storage, IAM, observability)
- Deep experience with Kubernetes (GKE), Helm, containerization, Terraform (IaC), and ArgoCD
- Strong programming skills in Python, Go, or TypeScript/JavaScript for automation and internal tooling
- Experience defining and operating against SLIs, SLOs, and error budgets
- Strong knowledge of relational and distributed databases (e.g., MySQL, Cloud SQL, Cloud Spanner, Redis), including performance tuning and HA strategies
- Experience leading incident response, root cause analysis, and systemic remediation
- Experience in fintech or regulated environments
- Familiarity with CI tooling (GHA, Jenkins, Tekton, CircleCI)
- Experience in high-growth startups