Profile end-to-end neural reconstruction workflows and identify bottlenecks across data loading, initialization, training, rendering, evaluation, and export
Improve CUDA and PyTorch performance for Gaussian Splatting and neural reconstruction workloads
Analyze GPU performance using tools such as Nsight Systems, Nsight Compute, NVTX, PyTorch Profiler, CUDA events, and benchmark dashboards
Optimize sparse and irregular rendering workloads
Validate that performance improvements preserve reconstruction quality, numerical behavior, camera/lidar correctness, and production reliability
Build repeatable benchmarks, regression tests, and profiling workflows to catch performance and quality regressions early
Collaborate with researchers, CUDA engineers, ML engineers, and production teams to turn promising prototypes into maintainable, reviewable, production-quality code
Requirements
BS, MS, PhD, or equivalent experience in Computer Science, Computer Engineering, Electrical Engineering, Applied Math, Robotics, Computer Vision, Machine Learning, or a related field
12+ years of experience
Strong programming skills in Python and C++
Hands-on experience with PyTorch or a similar tensor/autograd framework
Experience optimizing GPU-accelerated workloads using CUDA, C++/CUDA extensions, or related GPU programming approaches
Practical experience with profiling and performance analysis
Ability to develop benchmarks and validate that optimizations preserve correctness, numerical behavior, and user-visible quality.