Pindrop is redefining trust in the digital age with innovative voice and video authentication technologies. They are seeking a Staff Software Engineer to lead the development of real-time authentication systems, ensuring reliability and operational excellence in a production environment.
Responsibilities:
- Own one or more core authentication domains end-to-end across design, implementation, migrations, deprecations, and longer-term technical direction
- Design and operate real-time distributed authentication services as cloud-native, containerized microservices with explicit behavior under load, partial failure, and degraded dependencies
- Take full operational ownership, including on-call that covers nights and weekends, after-hours releases, and incident response, postmortems, and lasting fixes
- Define and execute safe change strategies for auth and model releases through staged rollouts, production validation, clear rollback criteria, and rehearsed playbooks
- Work with research and ML teams to ship models into auth products and to use language-model based tools where they measurably improve incident handling, log and metric analysis, runbooks, or policy and rules workflows
- Lead cross-functional initiatives that standardize auth APIs and policies, simplify ML-powered decision paths, reduce operational and integration overhead across engineering, product, research, and customer-facing teams, and back those initiatives with empirical evidence tied to clear business outcomes from design through adoption
Requirements:
- 8+ years of software development experience
- Significant experience designing and operating latency-sensitive backend APIs or services at scale in domains such as authentication, payments, or risk
- Hands-on production MLOps experience with reproducible training, data and model versioning, promotion gates, online monitoring tied to business outcomes, and rollback for regressions
- Experience using large language model-based tooling (AI-augmented design and development using eg: Claude Code or Codex) or similar techniques in production or internal workflows such as incident and log summarization, runbook and documentation assistance, or structured decision support, with attention to evaluation, guardrails, and failure modes
- Strong operational instincts, including ownership of on-call for critical services, leading incidents, and turning runbook, alert, and SLO work into durable reliability patterns rather than one-off fixes
- Strong programming and debugging skills in at least one modern backend language; Go and Python are common in our stack, and you are comfortable designing and debugging cloud-native, containerized, and asynchronously connected services using queues, streams, or workflows
- Familiarity with identity, security, or fraud detection domains is a plus but not required