RadixArk is an infrastructure-first company focused on building world-class open systems for AI inference and training. They are seeking a Member of Technical Staff — Inference to design and optimize large-scale AI inference systems, ensuring performance, latency, and throughput across thousands of GPUs.
Responsibilities:
- Design and build large-scale inference systems for frontier AI models
- Optimize latency, throughput, and GPU utilization in production inference
- Develop and improve model serving architectures and runtimes
- Work on batching, scheduling, and memory management strategies
- Collaborate with kernel, compiler, and systems teams on performance optimization
- Debug performance bottlenecks across the stack
- Drive reliability and scalability of inference infrastructure
- Build tooling for observability, profiling, and performance analysis
- Contribute to long-term inference architecture and strategy