SentinelOne is a pioneering company at the intersection of AI and security, dedicated to building a safer future for humanity. They are seeking a Staff Backend Software Engineer to design and develop backend services for their on-premises deployments, ensuring operational correctness and reliability for enterprise customers.
Responsibilities:
- Design and develop backend services in Python (Flask, SQLAlchemy, gevent) that run in customer-managed environments, with a strong focus on operational correctness and upgrade safety
- Own the deployment lifecycle: build and maintain Docker images, and ensure services start correctly across a wide range of customer infrastructure configurations
- Build and evolve REST and gRPC APIs consumed by both internal services and external management consoles, maintaining strict backwards-compatibility contracts as the platform scales
- Work closely with the database layer (PostgreSQL and MongoDB) to write Alembic migrations that run safely in production, handle schema evolution without downtime, and keep query performance healthy under load
- Drive observability improvements by instrumenting services with OpenTelemetry, defining SLOs, and making sure operators can diagnose issues in environments where SentinelOne has limited visibility
Requirements:
- 8+ years of backend engineering experience with Python in a production microservices environment, including deep familiarity with Flask, SQLAlchemy, and async concurrency patterns (gevent or asyncio)
- Hands-on experience packaging and deploying containerized services with Docker and Kubernetes, including writing Helm charts and reasoning about upgrade paths across multiple deployed versions
- Strong PostgreSQL skills including schema design, query optimization, and writing zero-downtime migration scripts using Alembic or equivalent tools
- Experience building and maintaining gRPC and REST APIs with explicit versioning strategies, preferably in environments where breaking changes are costly
- Solid understanding of observability: structured logging, distributed tracing with OpenTelemetry or equivalent, and building dashboards in Grafana or similar tools