About this role

Design, implement, and optimize GPU computing kernels to accelerate model training and inference for next-generation 3D GenAI models.
Develop and maintain domain-specific libraries and performance-critical components for 3D generation workloads.
Work closely with researchers and infra engineers to identify bottlenecks, benchmark performance, and deliver high-efficiency, production-ready GPU modules.

Hands-on experience with CUDA and GPU programming.
Strong programming skills in C++ and Python.
Solid understanding of parallel programming, performance tuning, and numerical computation.
Experience with quantization, model compression, or other efficiency-oriented model optimization techniques.
Knowledge of computer graphics, rendering pipelines, or geometry processing.
Familiarity with GPU profiling tools (e.g., Nsight, nvprof) or hardware-aware optimization.

Machine Learning Engineer

Key skills