Help build the world's largest end-to-end 3D native machine learning systems.
Build the end to end ML framework dedicated for 3D, from pretraining, to finetuning, inferencing, etc.
Work closely with researchers to co-design the next frontier of 3D & Spatial AI.
Build and debug on top of modern PyTorch, for maximum parallelism and efficiency.
Identify bottlenecks and optimize for high throughput & efficient distributed model training.
Implement and maintain 3D specific custom operators in Triton or CUDA.
Build efficient inference endpoints with complex multi-stage model pipelines.
Optimize models through compilation, fusion, quantization, etc.

Experience in machine learning or high performance graphics.
Solid practical understanding of at least one machine learning framework (e.g. PyTorch, JAX).
Strong ability to write beautiful and maintainable code in Python and/or C++.
Ability to learn fast and dive into new concepts or complex codebases.
Performance and efficiency oriented mindset, with a strong interest in the tiniest detail.
Strong communication skills for working in a globally distributed team.
A strong passion to navigate through the PyTorch internals, with hands-on experience in areas like torch.compile , fully_shard (FSDP2) APIs (nice to have).
Experience with building Triton kernels (nice to have).
Experiences with large-scale distributed training, familiarity with modern parallelization techniques: DP, TP, CP, PP, zero redundancy optimizers, etc. (nice to have).
Experience with diffusion models in 3D or video (nice to have).
Experience with low precision bf16 or fp8 training (nice to have).

ML System Engineer, Generative AI

Key skills