Stellar Development Foundation is a small team focused on incubating a novel distributed systems prototype. They are seeking a dedicated DevOps Engineer to own the deployment, release, and observability lifecycle of their software and network, ensuring robust, automated infrastructure and operational readiness.
Responsibilities:
- Design and implement multi-region cluster deployments across bare metal and cloud environments
- Build and maintain rigorous CI/CD pipelines
- Ensure deterministic builds, manage versioning, and integrate automated E2E and smoke tests that exercise the system upon every merge
- Architect the monitoring stack
- Build out comprehensive Prometheus and Grafana dashboards for node health, consensus progress, and system throughput
- Set up intelligent alerting for network anomalies
- Define the configuration management for node identity and network topology
- Work with the core engineers to actively chaos-test the network: partitioning nodes, simulating localized outages, and verifying automated recovery
- Write the runbooks and establish the procedures for network upgrades, state snapshots, and incident response
Requirements:
- Deep experience in a DevOps, SRE, or Infrastructure role managing complex, highly available systems
- Exceptional command of Infrastructure as Code (Terraform, Ansible) and container orchestration (Kubernetes, Docker)
- Extensive experience building robust CI/CD pipelines (GitHub Actions, GitLab CI, etc.) for compiled languages
- Mastery of modern observability stacks (Prometheus, Grafana, ELK/Loki, Sentry)
- Strong scripting skills (Python, Bash, etc.) and the ability to comfortably read and navigate Rust, Go and C++ build systems and codebases to understand system behavior, even if you aren't writing core features
- A methodical approach to operational stability and a deep appreciation for system correctness
- Experience operating blockchain networks, validator nodes, or other PAXOS/Raft-inspired distributed systems
- Experience with bare-metal provisioning and tuning network I/O for high-performance distributed databases