RadixArk is an infrastructure-first company focused on building world-class open systems for AI inference and training. They are seeking a TPU Systems Engineer to develop high-performance systems using JAX, XLA, and Pallas, optimizing workloads on TPU hardware.
Responsibilities:
- Build high-performance inference and training systems using JAX/XLA/Pallas, including SGLang-JAX
- Push large-model workloads to the limits on the newest TPU hardwares
- Optimize end-to-end latency and throughput for LLM serving on TPU infrastructure
- Design and implement SPMD strategies for efficient distributed inference and training
- Design and implement Pallas kernels for operations that require customized low level control for best performance
- Profile and optimize XLA compilation pipelines and HLO graph transformations
- Collaborate with kernel engineers and compiler teams to achieve performance wins across the stack
- Contribute to open-source projects with TPU optimization guides, benchmarks, and architectural insights