Archetype AI is developing an innovative AI platform aimed at integrating AI into real-world applications. They are seeking a highly motivated Staff Backend Software Engineer to design and develop scalable inference services, working closely with researchers and product teams to implement cutting-edge AI capabilities.

Responsibilities:

Architect, implement, and maintain distributed inference serving systems that support high-throughput, low-latency model serving across multiple AI accelerator families and cloud platforms
Enable breakthrough research by providing scientists with high-performance inference infrastructure to develop next-generation models
Continuously optimize inference performance—including batching, caching, and request routing strategies—to maximize compute efficiency under explosive customer growth
Build tooling and observability to monitor system health, identify bottlenecks, and proactively resolve instability
Introduce new techniques, architectures, and best practices to push the limits of scalability, efficiency, and reliability
Own problems end-to-end—from design to deployment—with a strong bias toward quality, automation, and continuous improvement
Balance rapid iteration on early-stage systems with long-term maintainability and architectural soundness
Contribute to a culture of engineering excellence, mentorship, and team-first collaboration

Requirements:

7+ years of professional software engineering experience, with a focus on inference
Deep understanding of machine learning systems at scale including load balancing, request routing, or traffic management
Experience with inference optimization, batching, and caching strategies
Ability to design APIs and service interfaces for real-time and latency-sensitive use cases
Experience building and operating production-grade systems at scale in cloud environments (e.g., Azure, AWS, GCP)
Strong debugging, instrumentation, and observability skills across distributed systems
Demonstrated ownership of complex technical problems and ability to learn and adapt quickly
Proven track record of scaling systems through rapid growth and rebuilding or refactoring for new demands
Experience building systems that degrade gracefully under load: backpressure, rate limiting, circuit breaking, bulkheading, and queuing
Strong understanding of failure modes in distributed systems and mitigation techniques
Proven experience owning high-availability services (e.g., SLOs, incident response, on-call), including capacity planning and load testing
Proficiency in multiple programming languages (e.g., Rust, C++, Python)
Experience designing internal tools or platforms to support developer productivity and experimentation
Strong product intuition, and ability to collaborate closely with cross-functional teams including research and design

Staff Backend Software Engineer: Inference

Key skills

About this role

Responsibilities:

Requirements: