Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. They are seeking an infrastructure research engineer to design, optimize, and maintain the compute foundations that power large-scale language model training, collaborating with researchers and systems architects.

Responsibilities:

Design and implement custom ML kernels (e.g., CUDA, CuTe, Triton) for core LLM operations such as attention, matrix multiplication, gating, and normalization, optimized for modern GPU and accelerator architectures
Design and think through compute primitives to reduce memory bandwidth bottlenecks and improve kernel compute efficiency
Collaborate with research teams to align kernel-level optimizations with model architecture and algorithmic goals
Develop and maintain a library of reusable kernels and performance benchmarks that serve as the foundation for internal model training
Contribute to infrastructure stability and scalability, ensuring reproducibility, consistency across precision formats, and high utilization of compute resources
Document and share insights through internal talks, technical papers, or open-source contributions to strengthen the broader ML systems community

Research Engineer, Infrastructure, Kernels

Key skills

About this role

Responsibilities: