Design model training and inference workflows with clear versioning, lineage, and promotion criteria where models are part of the system.
Define service responsibilities, interfaces, and data contracts that evolve safely.
Specify behavior under retries, timeouts, partial failures, and dependency degradation.
Choose consistency and durability guarantees that match risk, latency targets, and operational realities.
Design the request path for predictable tail latency and controlled resource usage.
Build and operate high-performance services and APIs that keep authentication reliable, secure, and fast at scale.
Implement distributed services that are safe under concurrency and robust to duplicate and out-of-order events.
Build real-time scoring and decision services with clear input/output contracts and bounded execution time.
Build distributed training pipelines that scale, are reproducible, and produce auditable artifacts.
Build pipelines that move data and model artifacts through validation, promotion, and release.
Define automated quality gates for service changes and releases.
Add checks for data quality, schema/contract adherence, and training-serving consistency where appropriate.
Define acceptance criteria tied to measurable outcomes and production behavior.
Ship changes with staged rollouts and rollback readiness as defaults.
Coordinate multi-service releases with clear cutover and recovery plans.
Use production signals to validate rollouts and trigger rollback when risk is high.
Participate in on-call rotation, including nights and weekends.
Own after-hours production releases, including rollout validation, monitoring, and rollback execution.
Instrument the full path with metrics, logs, and traces that enable fast detection and diagnosis.
Implement alerting that reflects user impact, not just component health.
Lead incident response for your services, restore service quickly, and communicate clearly during events.
Run post-incident reviews and close follow-ups that measurably reduce recurrence.
Drive reliability work through SLIs, SLOs, and error budgets, and make tradeoffs explicit.
Improve performance and cost through profiling, load testing, and capacity planning.
Raise engineering quality through reviews, standards, and simplification of operationally expensive designs.
Align across teams on interfaces, data contracts, and reliability expectations to reduce coordination friction.
Evaluate new approaches when they materially improve security, performance, delivery safety, or operational simplicity.
Requirements
5–7 years of software development experience.
Experience designing and implementing highly scalable cloud-based APIs.
Experience with multiple programming languages, such as Python and Go.
Expertise in data structures, algorithms, and concurrency.
Experience building and operating real-time distributed systems, including patterns for resilient services such as backpressure, idempotency, timeouts, and retry or circuit-breaking strategies.
Experience working with production ML systems and MLOps (for example, model deployment, feature pipelines, experiment tracking, and model or data quality monitoring) is a strong plus, but not required.
2+ years of experience in DevOps practices towards deployment of SaaS services, including hands-on experience with Jenkins and GitHub Actions; implementing and maintaining CI/CD pipelines; and managing and maintaining applications in a multi-container environment such as Kubernetes.
Knowledge of different data storage technologies, such as Redis and MySQL.
Knowledge of Docker and container orchestration frameworks such as Kubernetes.
Experience developing and maintaining services using AWS native products such as Kinesis, DynamoDB, and S3.
Experience with observability and monitoring tools such as Prometheus, Grafana, and cloud logging and tracing.
Linux proficiency.
Tech Stack
AWS
Cloud
Distributed Systems
Docker
DynamoDB
Grafana
Jenkins
Kubernetes
Linux
MySQL
Prometheus
Python
Redis
Go
Benefits
Competitive compensation, including equity for all employees
Unlimited Paid Time Off (PTO)
Generous health and welfare plans to choose from
including one employer-paid “employee-only” plan!
Best-in-class Health Savings Account (HSA) employer contribution
Affordable vision and dental plans for you and your family
Employer-provided life and disability coverage with additional supplemental options
Paid Parental Leave
Equal for all parents, including birth, adoptive & foster parents
One year of diaper delivery for your newest addition to the family! It’s our way of welcoming new Pindroplets to the family!
Identity protection through Norton LifeLock
Recurring monthly Phone and Internet allowance
One-time home office allowance
Remote first environment – meaning you have flexibility in your day!
Company holidays
Annual professional development and learning benefit
Pick your own Apple MacBook Pro
Retirement plan with competitive 401(k) match
Wellness Program including Employee Assistance Program, 24/7 Telemedicine