Fundraise Up is a global fundraising platform dedicated to making donations to nonprofits seamless and accessible. The DevOps Engineer/SRE will be responsible for the stability, performance, and security of server infrastructure, working primarily with on-premise systems and driving automation projects.

Responsibilities:

Work primarily with on‑premise infrastructure (bare metal and VMs): setup, maintenance, troubleshooting
Drive clarity in ambiguous situations by defining requirements, assumptions, and next steps
Own automation projects end‑to‑end (design → rollout → maintenance)
Improve how we operate: harden and tune systems and also improve the way the team works in terms of operational hygiene
Keep the platform stable, fast, and secure: servers, web servers, databases, queues
Investigate production incidents across OS / networking / infrastructure layers, apply temporary mitigations, coordinate with developers and participate in post‑mortems
Participate in on‑call rotations
Use AI in all aspects of day‑to‑day work: researching, troubleshooting, developing

Requirements:

4+ years as a DevOps Engineer / SRE (or very close responsibilities)
Real, hands-on experience with servers (VMs, bare metal) at the OS level and below: configuring, troubleshooting, digging into 'why it's broken'
Confident Linux skills (we use Ubuntu). We expect you to be comfortable with the core tools from Linux Crisis Tools
Solid understanding of networking basics; ability to configure and troubleshoot iptables
Ansible + Git
Experience with Bash or Python scripting for automation/observability
Production/on‑call experience: diagnosing incidents, restoring service, participating in post‑mortems
Ownership and attention to detail. Downtime is expensive: five years ago, 10 minutes of downtime cost us $100k — today it's even more
ClickHouse, MongoDB: what each database is used for, monitoring, troubleshooting performance and slow queries, sharding
Kafka: operating clusters at scale (topic moves, broker replacements, tuning)
Redis: high‑load tuning, replication, sharding, performance monitoring
Elasticsearch: configuration, scaling, sharding/cluster management
HAProxy / Nginx: load balancing, SSL/TLS, caching, reverse proxying, performance monitoring
OS tuning: kernel/network stack/filesystem parameters for high‑load systems
Full Disk Encryption on LVM: We use Clevis + Tang in production
Infrastructure Security: Teleport, HashiCorp Vault
VictoriaMetrics and how it differs from the Prometheus stack
Complex CI/CD pipelines. We use scripted Jenkins pipelines
Bare‑metal Kubernetes: provisioning, networking (MetalLB or alternatives), isolation from the internet, scaling across providers (like OVH, Hetzner) and integration with existing infrastructure
Flux and GitOps
Terraform

DevOps Engineer / SRE

Key skills

About this role

Responsibilities:

Requirements: