SentinelOne is a pioneering company at the intersection of AI and security, focused on delivering autonomous detection and response for cybersecurity. They are seeking a Senior Staff Software Engineer to own the architecture of their self-hosted Endpoint Protection platform, ensuring high availability, scalability, and reliability in customer-controlled environments.
Responsibilities:
- Own and evolve the architecture of the self-hosted platform across multiple teams — backend services, data pipelines, control logic, and the deployment topology shipped to customers (containerized microservices on bare metal, delivered as an appliance/OVA) — and implement improvements to existing architecture
- Set the standard for high availability and resilience in customer-controlled deployments: clustering, replication, failover, consensus/leader election, and graceful degradation across single-node and multi-node topologies, and drive infrastructure cost analysis and optimization
- Define scalability and capacity-planning strategies that hold across a wide range of customer scale and hardware, and establish frameworks for performance, observability, and operational excellence in constrained and air-gapped environments
- Lead the translation of SaaS/cloud-native capabilities into on-prem architecture, refine vaguely specified complex requirements into robust, future-proof end-to-end designs, and define coding patterns and standards that span multiple teams
- Influence the engineering roadmap, drive medium-to-large initiatives that span teams, mentor staff and senior engineers, and act as a cross-team technical authority sought out to review the work of others
Requirements:
- A degree in Computer Science or Software Engineering, or equivalent experience
- Roughly 8+ years of related experience
- Deep hands-on expertise in Go and/or Python
- Experience with technologies such as PostgreSQL, MongoDB, Redis, Kafka, Docker, and Linux
- Extensive, proven experience designing and delivering on-prem / self-hosted / customer-deployed software
- Experience with packaging and lifecycle for customer-controlled environments (appliance/OVA, bare-metal or containerized deployments, upgrades, and air-gapped or restricted networks)
- A strong track record architecting distributed systems for high availability and fault tolerance in on-prem / customer-controlled deployments
- Experience with replication, clustering, failover, consensus/leader election
- Deep command of scalability and performance in on-prem deployments
- Experience with horizontal/vertical scaling, sharding/partitioning, load balancing, capacity planning
- Ability to take vaguely specified, complex requirements and create efficient, robust, future-proof end-to-end designs across teams
- Experience influencing a roadmap and leading multi-team designs
- Excellent communication and mentoring skills
- Experience with security products