Pindrop is redefining trust in the digital age with its innovative authentication technologies. The Senior Software Engineer will be responsible for designing, building, validating, and operating high-performance services and APIs to ensure reliable and secure authentication at scale.
Responsibilities:
- Design model training and inference workflows with clear versioning, lineage, and promotion criteria where models are part of the system
- Define service responsibilities, interfaces, and data contracts that evolve safely
- Specify behavior under retries, timeouts, partial failures, and dependency degradation
- Choose consistency and durability guarantees that match risk, latency targets, and operational realities
- Design the request path for predictable tail latency and controlled resource usage
- Build and operate high-performance services and APIs that keep authentication reliable, secure, and fast at scale
- Implement distributed services that are safe under concurrency and robust to duplicate and out-of-order events
- Build real-time scoring and decision services with clear input/output contracts and bounded execution time
- Build distributed training pipelines that scale, are reproducible, and produce auditable artifacts
- Build pipelines that move data and model artifacts through validation, promotion, and release
- Define automated quality gates for service changes and releases
- Add checks for data quality, schema/contract adherence, and training-serving consistency where appropriate
- Define acceptance criteria tied to measurable outcomes and production behavior
- Ship changes with staged rollouts and rollback readiness as defaults
- Coordinate multi-service releases with clear cutover and recovery plans
- Use production signals to validate rollouts and trigger rollback when risk is high
- Instrument the full path with metrics, logs, and traces that enable fast detection and diagnosis
- Implement alerting that reflects user impact, not just component health
- Lead incident response for your services, restore service quickly, and communicate clearly during events
- Run post-incident reviews and close follow-ups that measurably reduce recurrence
- Drive reliability work through SLIs, SLOs, and error budgets, and make tradeoffs explicit
- Improve performance and cost through profiling, load testing, and capacity planning
- Raise engineering quality through reviews, standards, and simplification of operationally expensive designs
- Align across teams on interfaces, data contracts, and reliability expectations to reduce coordination friction
- Evaluate new approaches when they materially improve security, performance, delivery safety, or operational simplicity
Requirements:
- 5–7 years of software development experience
- Experience designing and implementing highly scalable cloud-based APIs
- Experience with multiple programming languages, such as Python and Go
- Expertise in data structures, algorithms, and concurrency
- Experience building and operating real-time distributed systems, including patterns for resilient services such as backpressure, idempotency, timeouts, and retry or circuit-breaking strategies
- 2+ years of experience in DevOps practices towards deployment of SaaS services, including hands-on experience with Jenkins and GitHub Actions; implementing and maintaining CI/CD pipelines; and managing and maintaining applications in a multi-container environment such as Kubernetes
- Knowledge of different data storage technologies, such as Redis and MySQL
- Knowledge of Docker and container orchestration frameworks such as Kubernetes
- Experience developing and maintaining services using AWS native products such as Kinesis, DynamoDB, and S3
- Experience with observability and monitoring tools such as Prometheus, Grafana, and cloud logging and tracing
- Linux proficiency
- You have built and operated distributed production systems at scale
- You write systems that handle concurrency, duplication, and out-of-order events without surprising behavior
- You design for explicit failure modes, safe retries, and stable contracts
- You can design for predictable tail latency and controlled resource usage in real-time request paths
- You ship safely with staged rollouts, rollback readiness, and change discipline
- You use observability to reason about production and instrument systems for fast detection and diagnosis
- You can lead incident response and post-incident reviews and drive long-term reliability improvements
- You have managed model lifecycle in production: versioning, validation, staged rollout, rollback, and outcome-tied monitoring
- You have built or operated distributed training pipelines with reproducibility, lineage, and controlled promotion
- You understand drift and training-serving skew risks and mitigate them with contracts, tests, and monitoring
- You have built or operated real-time inference services in production
- You communicate clearly, document decisions, and drive alignment on tradeoffs and success criteria
- You can take ambiguous problems, define scope, and deliver steady progress while keeping the quality bar high
- You actively seek out and remove unnecessary complexity, understanding that simplicity is a prerequisite for reliability, security, and velocity at scale
- Familiarity with voice authentication, fraud detection, or deepfake detection is a plus, not a requirement
- Experience working with production ML systems and MLOps (for example, model deployment, feature pipelines, experiment tracking, and model or data quality monitoring) is a strong plus, but not required