Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. They are seeking an infrastructure research engineer to design, optimize, and scale systems that power large AI models, making inference faster and more reliable.
Responsibilities:
- Work alongside researchers and engineers to bring cutting-edge AI models into production
- Collaborate with research teams to enable high-performance inference for novel architectures
- Design and implement new techniques, tools, and architectures that improve performance, latency, throughput, and efficiency
- Optimize our codebase and compute fleet (e.g., GPUs) to fully utilize hardware FLOPs, bandwidth, and memory
- Extend orchestration frameworks (e.g., Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving
- Establish standards for reliability, observability, and reproducibility across the inference stack
- Publish and share learnings through internal documentation, open-source libraries, or technical reports that advance the field of scalable AI infrastructure