Helius is building the core infrastructure for Solana, empowering developers to create the next generation of crypto-powered applications. The Staff Backend Engineer will own and evolve Gatekeeper, focusing on architectural decisions to improve performance and reliability of Helius’s edge gateway.
Responsibilities:
- Lead the technical direction for Gatekeeper as the unified entry point for Helius traffic, with an emphasis on p50/p99 latency and tail reliability
- Design and implement routing and load balancing strategies across regions and backend pools, including failover behavior and graceful degradation
- Improve connection handling end-to-end: TLS termination, keepalives, pooling, timeouts, backpressure, and request/response streaming behavior
- Build robust, operator-friendly observability: SLOs, dashboards, alerts, and 'is it healthy?' views that make issues diagnosable fast
- Partner with internal service teams to define and enforce contracts (timeouts, retries, error mapping, capacity signals), and reduce systemic failure modes
- Drive hardening work across security and abuse controls (auth failure behavior, rate limiting / caps enforcement, request validation)
- Own production operations for Gatekeeper: incident response, on-call improvements, runbooks, and post-incident follow-through
- Mentor engineers and raise the bar on performance engineering, operational rigor, and code quality
Requirements:
- Significant experience building and operating high-throughput backend systems in production (proxies, gateways, distributed services, or infra-heavy platforms)
- Deep understanding of networking fundamentals and HTTP behavior (TLS, TCP, connection reuse, proxies, load balancers, timeouts)
- Strong performance engineering skillset: profiling, benchmarking, and making latency/throughput tradeoffs with rigor
- Track record of leading ambiguous, cross-team projects and shipping durable systems
- Operational excellence: you have owned services with real on-call responsibility, and you make them easier to run over time
- Excellent communication: you can write clear design docs, align stakeholders, and make decisions legible
- Rust experience (or strong interest in working close to the metal for performance-critical systems)
- Experience with anycast, multi-region traffic management, or edge deployments
- Familiarity with WebSockets at scale and the operational challenges that come with long-lived connections
- Experience building internal platforms that standardize observability, incident response, and service reliability
- Interest in Solana / crypto infrastructure, market data, or latency-sensitive trading systems