Archetype AI is developing an innovative AI platform aimed at integrating AI into real-world applications. They are seeking a highly motivated Staff Backend Software Engineer to design and develop scalable inference services, working closely with researchers and product teams to implement cutting-edge AI capabilities.
Responsibilities:
- Architect, implement, and maintain distributed inference serving systems that support high-throughput, low-latency model serving across multiple AI accelerator families and cloud platforms
- Enable breakthrough research by providing scientists with high-performance inference infrastructure to develop next-generation models
- Continuously optimize inference performance—including batching, caching, and request routing strategies—to maximize compute efficiency under explosive customer growth
- Build tooling and observability to monitor system health, identify bottlenecks, and proactively resolve instability
- Introduce new techniques, architectures, and best practices to push the limits of scalability, efficiency, and reliability
- Own problems end-to-end—from design to deployment—with a strong bias toward quality, automation, and continuous improvement
- Balance rapid iteration on early-stage systems with long-term maintainability and architectural soundness
- Contribute to a culture of engineering excellence, mentorship, and team-first collaboration
Requirements:
- 7+ years of professional software engineering experience, with a focus on inference
- Deep understanding of machine learning systems at scale including load balancing, request routing, or traffic management
- Experience with inference optimization, batching, and caching strategies
- Ability to design APIs and service interfaces for real-time and latency-sensitive use cases
- Experience building and operating production-grade systems at scale in cloud environments (e.g., Azure, AWS, GCP)
- Strong debugging, instrumentation, and observability skills across distributed systems
- Demonstrated ownership of complex technical problems and ability to learn and adapt quickly
- Proven track record of scaling systems through rapid growth and rebuilding or refactoring for new demands
- Experience building systems that degrade gracefully under load: backpressure, rate limiting, circuit breaking, bulkheading, and queuing
- Strong understanding of failure modes in distributed systems and mitigation techniques
- Proven experience owning high-availability services (e.g., SLOs, incident response, on-call), including capacity planning and load testing
- Proficiency in multiple programming languages (e.g., Rust, C++, Python)
- Experience designing internal tools or platforms to support developer productivity and experimentation
- Strong product intuition, and ability to collaborate closely with cross-functional teams including research and design