Develop and maintain core components of the serverless inference platform to ensure high availability and scalability for Cloudflare users.
Optimize the model scheduling system to significantly increase efficiency and resource utilization across our inference infrastructure.
Implement improvements to the inference request routing logic to enhance overall performance and reduce latency for end-users.
Drive significant, measurable improvements in the platform's reliability and resilience by identifying and mitigating systemic risks.
Expand and refine the observability stack, including metrics, logging, and tracing, and fine-tune alerts to proactively identify and resolve production issues.
Lead complex, cross-functional technical projects from initial concept and design through final deployment and operationalization.
Act as a mentor to junior engineers and actively contribute to cultivating a strong, collaborative engineering culture within the team.
Requirements
Experience in systems engineering, with a focus on distributed, high-performance systems.
Expert proficiency in Rust programming, particularly in an asynchronous environment.
Deep understanding and hands-on experience with relevant networking and application protocols (e.g., TCP, HTTP, WebSocket).
Experience with scaling and performance optimization techniques, including load balancing and caching in a distributed environment.
Demonstrable experience with container orchestration platforms, specifically Kubernetes and/or Nomad.
Familiarity with the challenges and architectures involved in large-scale inference serving (e.g., LLM and diffusion models).