As a Software Engineer: Distributed Systems, you will be part of a Resiliency Organization responsible for the core services that power Cloudflare’s global operations.
We are looking for engineers to join the Infrastructure Intelligence team and shape the transition toward model-driven network orchestration.
The team is building a cutting-edge 'Maintenance Coordination System', powered by an infrastructure dependency graph of one of the world's largest physical networks.
By creating the robust primitives for global coordination today, you will be enabling the next generation of data-driven infrastructure at Cloudflare.
Requirements
A degree in Computer Science, Engineering, Mathematics, Statistics or related field; OR have relevant background/experience to the field.
Programming experience in Go, or similar languages
Experience in designing and implementing secure and highly-available distributed systems
Experience (and love) for debugging to ensure the system works in all cases
Experience with a continuous integration workflow and using source control (we use git)
Experience with continuous delivery and deployment of a k8s hosted application
Understanding of security issues and responsibilities
Experience with monitoring, alerting and debugging high volume production systems
Fluent in analyses of data sets such as logs
Strong English language oral and written communications skills
Designing and building APIs
Experience with the Cloudflare development stack is a plus.
At least 4 years of hands-on software development experience on meaningfully complex systems.
Experience with graph theory and building services for graph generation, storage and retrieval.
An understanding of the systems architecture required to scale machine learning model-driven decision engines in a production environment
Experience building both backend systems and frontend widgets.
Ability to contribute to planning, development, and execution to meet commitments and deliver with predictability.
Experience implementing tools, processes, internal instrumentation, and methodologies.
Comfortable working on projects with tight deadlines and short release cycles.
Strong verbal and written English language skills.
Experience with DCIM, CMDB, IPAM, and other Data Center and Asset Lifecycle Management tools is a plus.
Experience with data ingestion and analysis
pulling metrics from hundreds of edge data centers.
Tech Stack
Distributed Systems
Kubernetes
Go
Benefits
Medical/Rx Insurance
Dental Insurance
Vision Insurance
Flexible Spending Accounts
Commuter Spending Accounts
Fertility & Family Forming Benefits
On-demand mental health support and Employee Assistance Program
Global Travel Medical Insurance
Short and Long Term Disability Insurance
Life & Accident Insurance
401(k) Retirement Savings Plan
Employee Stock Participation Plan
Flexible paid time off covering vacation and sick leave
Leave programs, including parental, pregnancy health, medical, and bereavement leave