RadixArk is an infrastructure-first company focused on building world-class open systems for AI inference and training. They are seeking a Member of Technical Staff — Inference to design and optimize large-scale AI inference systems, ensuring performance, latency, and throughput across thousands of GPUs.

Responsibilities:

Design and build large-scale inference systems for frontier AI models
Optimize latency, throughput, and GPU utilization in production inference
Develop and improve model serving architectures and runtimes
Work on batching, scheduling, and memory management strategies
Collaborate with kernel, compiler, and systems teams on performance optimization
Debug performance bottlenecks across the stack
Drive reliability and scalability of inference infrastructure
Build tooling for observability, profiling, and performance analysis
Contribute to long-term inference architecture and strategy

Member of Technical Staff — Inference

Key skills

About this role

Responsibilities: