Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. They are seeking an infrastructure research engineer to design, optimize, and maintain the compute foundations that power large-scale language model training, collaborating with researchers and systems architects.
Responsibilities:
- Design and implement custom ML kernels (e.g., CUDA, CuTe, Triton) for core LLM operations such as attention, matrix multiplication, gating, and normalization, optimized for modern GPU and accelerator architectures
- Design and think through compute primitives to reduce memory bandwidth bottlenecks and improve kernel compute efficiency
- Collaborate with research teams to align kernel-level optimizations with model architecture and algorithmic goals
- Develop and maintain a library of reusable kernels and performance benchmarks that serve as the foundation for internal model training
- Contribute to infrastructure stability and scalability, ensuring reproducibility, consistency across precision formats, and high utilization of compute resources
- Document and share insights through internal talks, technical papers, or open-source contributions to strengthen the broader ML systems community