Archetype AI is developing the world's first AI platform to bring AI into the real world, formed by a high-caliber team from Google. They are seeking a highly motivated backend engineer to design and develop performant, scalable, and resilient inference services, working closely with researchers and product teams to bring AI capabilities into production.

Responsibilities:

Architect, implement, and maintain distributed inference serving systems that support high-throughput, low-latency model serving across multiple AI accelerator families and cloud platforms
Enable breakthrough research by providing scientists with high-performance inference infrastructure to develop next-generation models
Continuously optimize inference performance—including batching, caching, and request routing strategies—to maximize compute efficiency under explosive customer growth
Build tooling and observability to monitor system health, identify bottlenecks, and proactively resolve instability
Introduce new techniques, architectures, and best practices to push the limits of scalability, efficiency, and reliability
Own problems end-to-end—from design to deployment—with a strong bias toward quality, automation, and continuous improvement
Balance rapid iteration on early-stage systems with long-term maintainability and architectural soundness
Contribute to a culture of engineering excellence, mentorship, and team-first collaboration

Staff Backend Software Engineer: Inference

Key skills

About this role

Responsibilities: