Archetype AI is developing an innovative AI platform aimed at integrating AI into real-world applications. The Staff Backend Engineer will be responsible for leading the design and scaling of core backend systems, collaborating with various teams to ensure the successful deployment of AI models into production.
Responsibilities:
- Lead the architecture, design, and implementation of distributed systems supporting high-throughput, low-latency AI model inference and data services
- Collaborate with ML researchers and product teams to transition experimental models into production-grade systems
- Define technical strategy and best practices for backend systems, including GPU clusters, cloud infrastructure, and distributed data pipelines
- Drive performance optimization, reliability, and operational excellence across large-scale systems
- Build internal tools, monitoring, and observability frameworks to proactively detect and resolve issues
- Introduce innovative architectures, techniques, and automation to maximize scalability, efficiency, and reliability
- Mentor engineers, lead by example, and foster a culture of engineering excellence, knowledge sharing, and collaboration
- Balance rapid iteration on early-stage systems with long-term architectural soundness and maintainability
- Take ownership of end-to-end problem solving—from design through deployment—ensuring high quality and robust delivery
Requirements:
- 7+ years of professional software engineering experience, with a focus on backend or distributed systems
- Deep understanding of distributed systems fundamentals—concurrency, consistency, replication, fault tolerance, networking
- Experience building and operating production-grade systems at scale in cloud environments (e.g., Azure, AWS, GCP)
- Strong debugging, instrumentation, and observability skills across distributed systems
- Demonstrated ownership of complex technical problems and ability to learn and adapt quickly
- 7+ years of professional software engineering experience, with deep expertise in backend or distributed systems
- Strong understanding of distributed systems fundamentals: concurrency, consistency, replication, fault tolerance, and networking
- Experience building and operating production-grade systems at scale in cloud environments (AWS, GCP, Azure)
- Advanced debugging, instrumentation, and observability skills across complex distributed systems
- Proven ownership of complex technical problems and ability to drive them to completion
- Experience mentoring engineers and influencing architectural decisions across teams