Fundraise Up is a global fundraising platform that aims to make donating to nonprofits fast and accessible. The DevOps Engineer / SRE will be responsible for the stability, performance, and security of the server infrastructure, working with on-premise systems and driving automation projects.
Responsibilities:
- Work primarily with on‑premise infrastructure (bare metal and VMs): setup, maintenance, troubleshooting
- Drive clarity in ambiguous situations by defining requirements, assumptions, and next steps
- Own automation projects end‑to‑end (design → rollout → maintenance)
- Improve how we operate: harden and tune systems and also improve the way the team works in terms of operational hygiene
- Keep the platform stable, fast, and secure: servers, web servers, databases, queues
- Investigate production incidents across OS / networking / infrastructure layers, apply temporary mitigations, coordinate with developers and participate in post‑mortems
- Participate in on‑call rotations
- Use AI in all aspects of day‑to‑day work: researching, troubleshooting, developing
Requirements:
- 4+ years as a DevOps Engineer / SRE (or very close responsibilities)
- Real, hands-on experience with servers (VMs, bare metal) at the OS level and below: configuring, troubleshooting, digging into 'why it's broken'
- Confident Linux skills (we use Ubuntu). We expect you to be comfortable with the core tools from Linux Crisis Tools
- Solid understanding of networking basics; ability to configure and troubleshoot iptables
- Ansible + Git
- Experience with Bash or Python scripting for automation/observability
- Production/on-call experience: diagnosing incidents, restoring service, participating in post-mortems
- Ownership and attention to detail. Downtime is expensive: five years ago, 10 minutes of downtime cost us $100k — today it's even more
- ClickHouse, MongoDB: what each database is used for, monitoring, troubleshooting performance and slow queries, sharding
- Kafka: operating clusters at scale (topic moves, broker replacements, tuning)
- Redis: high-load tuning, replication, sharding, performance monitoring
- Elasticsearch: configuration, scaling, sharding/cluster management
- HAProxy / Nginx: load balancing, SSL/TLS, caching, reverse proxying, performance monitoring
- OS tuning: kernel/network stack/filesystem parameters for high-load systems
- Full Disk Encryption on LVM: We use Clevis + Tang in production
- Infrastructure Security: Teleport, HashiCorp Vault