Role Overview

Collaborate with exceptional engineers on building systems and services for the world's largest companies. This platform powers millions of production websites and supports massive scale, including over 1% of global Internet traffic and more than 10 billion monthly visits.
Lead architecture for distributed services at scale that synchronize shared state across clients, including clear correctness guarantees (eg: ordering, idempotency, convergence). These services require low latency and high availability, with SLO of 99.99% uptime.
Define concurrency and conflict-resolution semantics for concurrent changes, including trade-offs and constraints.
Design for failure: retries, partial outages, reconnection, and safe recovery paths, with explicit degradation behavior.
Own operational excellence: define SLIs/SLOs, instrument tracing/metrics/logging, and drive reliability improvements through incident learning.
Drive cross-team technical alignment via design docs and decision records; unblock execution across org boundaries.
Raise the bar through design and code reviews, mentoring, and pragmatic standardization that increases leverage.
Deliver maintainable, tested, performant systems and evolve them with a “crawl, walk, run” plan.
Use modern tooling (including AI-assisted coding, debugging and code review) to improve developer velocity and reduce time-to-diagnosis in production.
Participate in engineering citizenship activities such as co-authoring engineering blogs, strengthening and improving our hiring processes, and leading internal hackathon teams.

Requirements

BA/BS degree or equivalent experience
At least 7, preferably 10+ years of building and operating large-scale production distributed systems where latency, correctness, and reliability (99.99% uptime) are non-negotiable.
Deep backend systems experience in one or more modern server environments (e.g., Java, Go, Rust, Python, Node.js, etc.), with the ability to ramp and adapt quickly in new stacks.
Expertise with distributed systems, concurrency, scaling, and debugging multi-layer systems.
Strong operational judgment: you define SLIs/SLOs, build observability, and improve systems via incidents and feedback loops, not heroics.
Staff behaviors: you lead multi-team initiatives, write decision-quality design docs, influence architecture beyond your immediate team, and communicate across the organization.
Ability to make decisions with incomplete information, understand and communicate one-way vs. two-way doors, and move with urgency while keeping critical code operational.
Stay curious and open to growth — actively building fluency in emerging technologies like AI to unlock creativity, accelerate progress, and amplify impact.

Tech Stack

Distributed Systems
Java
JavaScript
Node.js
Python
Rust
Go

Benefits

Ownership in what you help build. Every permanent Webflower receives equity (RSUs) in our growing, privately held company.
Health coverage that actually covers you. Comprehensive medical, dental, and vision plans for full-time employees and their dependents, with Webflow covering most premiums.
Support for every stage of family life. 12 weeks of paid parental leave for all parents and 6+ weeks of additional paid leave for birthing parents. Plus inclusive care for family planning, menopause, and midlife transitions.
Time off that’s actually off. Flexible vacation, paid holidays, and a sabbatical program to help you recharge and come back inspired.
Wellness for the whole you. Access to mental health resources, therapy and coaching.
Invest in your future. A 401(k) with 100% employer match (up to $6,000/year) in the U.S., and support for retirement savings globally.
Monthly stipends that flex with your life. Localized support for work and wellness expenses — from Wi-Fi to workouts.
Bonus for building together. All full-time, permanent, non-commission employees are eligible for our annual WIN bonus program.

Staff Distributed Systems Engineer – Collaboration

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits