Research and develop techniques to accelerate large scale applications running on NVIDIA’s family of advanced CPU platforms.
Work directly with other technical experts in their fields (industry and academia) to perform in-depth analysis and optimization of complex data intensive and compute intensive workloads to ensure the best possible performance on modern hardware architecture focused on CPU performance.
Publish and present discovered optimization techniques in developer blogs or relevant conferences to engage and educate the Developer community.
Influence the design of next-generation hardware architectures, software, and programming models in collaboration with research, hardware, system software, libraries, and tools teams at NVIDIA.
Requirements
Pursuing or recently completed a BS, MS or PhD in Computer Science, Computer Engineering, or related field (or equivalent experience)
Relevant work or research experience.
Knowledge of modern CPU architectures (ARM, x86) and system/OS
Experience with CPU architecture fundamentals, especially memory subsystem (cache DRAM, storage.)
Hands-on experience with low-level parallel and system programming, SIMD vectorization, CPU intrinsics and concurrent data structures.
Programming fluency in modern C/C++ with a deep understanding of algorithms, concurrency, and other optimization techniques.
Good communication and organization skills, with a logical approach to problem solving, and prioritization skills.