Thinking Machines Lab is dedicated to empowering humanity through advancing collaborative general intelligence. They are seeking generalist infrastructure and systems engineers to build the systems that power their foundation models and support research and product development teams in creating and shipping AI products.
Responsibilities:
- We interview generally, but during project selection we’ll take into account your interests and experience alongside organizational needs. This flexible approach allows us to match talented engineers with the infrastructure teams where they'll have the greatest impact and growth potential
- Core Infrastructure: We support teams that train, research, and ultimately serve AI models and build the underlying infrastructure for the clusters to reliably and safely train frontier models. Examples might include building systems and running large Kubernetes clusters with GPU workloads, or building infrastructure to support Tinker
- Data Infrastructure: We build and maintain the data systems for our research and products. You'll design and optimize data pipelines using tools like Spark and other modern data infrastructure technologies. You’ll build scalable, reliable, data infrastructure while embedding governance best practices
- Developer Productivity: We care deeply about research and engineering productivity and our ability to continue shipping quickly. We build tooling, systems, frameworks, and systems to make sure everyone gets well configured, optimized developer environments