Design, implement, benchmark, and iterate on CUDA-based kernels and custom operators to squeeze every last drop of performance out of on-vehicle inference workloads.
Build and improve tooling and infrastructure that make it easier to profile, debug, and validate CUDA kernels and accelerator-backend code across the AV stack.
Partner with AI Solutions, Compilers, and Architecture to translate model and system requirements into concrete kernel roadmaps, priorities, and project plans.
Collaborate with cross-functional teams (compiler, performance tooling, runtime, deployment solutions) to deliver reusable, reliable, high-performance libraries into production.
Maintain high technology standards, methodologies, processes, and guidelines for GPU kernel development and performance engineering through code review.
Manage relationships with internal customers to ensure our kernels and libraries meet real-world needs
Requirements
Minimum 2 + years of relevant industry experience or equivalent experience
BS, MS or PhD in CS, or related technical field
Excellent GPU programming skills in CUDA, with a thorough understanding of parallel programming patterns and GPU architecture.
Hands-on experience benchmarking, profiling, debugging and optimizing accelerator libraries and kernels to extract optimal performance using the NSight suite of tools or similar.
Strong background in software architecture, library design, and design patterns.
Strong C++ programming skills with the ability to feel comfortable in large codebases.
Solid background in system performance, high performance computing and/or architecture-aware optimizations .
Strong communication skills and the ability to work collaboratively within a team