Design, implement, and optimize GPU computing kernels to accelerate model training and inference for next-generation 3D GenAI models.
Develop and maintain domain-specific libraries and performance-critical components for 3D generation workloads.
Work closely with researchers and infra engineers to identify bottlenecks, benchmark performance, and deliver high-efficiency, production-ready GPU modules.
Requirements
Hands-on experience with CUDA and GPU programming.
Strong programming skills in C++ and Python.
Solid understanding of parallel programming, performance tuning, and numerical computation.
Experience with quantization, model compression, or other efficiency-oriented model optimization techniques.
Knowledge of computer graphics, rendering pipelines, or geometry processing.
Familiarity with GPU profiling tools (e.g., Nsight, nvprof) or hardware-aware optimization.
Tech Stack
Python
Benefits
We value intelligence and the pursuit of knowledge.
We care deeply about our work, our users, and each other.
We trust our instincts and are not afraid to take bold risks.