Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We are seeking a GPU Software Engineer with deep expertise in CUDA programming, GPU architecture, and high-performance computing to design and optimize compute-intensive workloads on modern accelerator hardware.

Responsibilities:

Design and implement high-performance CUDA kernels for compute-intensive workloads across AI and HPC use cases
Profile and optimize GPU code using tools such as Nsight Systems, Nsight Compute, and CUDA profilers
Tune memory access patterns, occupancy, register usage, and shared memory utilization for peak performance
Develop highly optimized libraries for linear algebra, attention, and other ML primitives
Optimize multi-GPU and multi-node training using NCCL, RDMA, and high-performance networking
Implement custom operators and fused kernels in PyTorch, JAX, or Triton
Collaborate with ML engineers to identify performance bottlenecks in training and inference pipelines
Develop benchmarks and regression tests to safeguard performance over time
Evaluate new GPU architectures and feature sets, and advise on adoption strategy
Contribute to compiler-level optimizations for tensor programs where appropriate, working at the boundary between ML frameworks and underlying accelerator codegen to unlock performance not reachable through framework-level tuning alone
Optimize memory hierarchy usage across HBM, L2, shared memory, and registers
Implement mixed-precision and quantized compute paths that maximize accelerator throughput while preserving numerical fidelity within bounds acceptable for the target workloads
Document performance characteristics, design decisions, and tuning playbooks for internal teams
Stay current with GPU architecture, CUDA evolution, and emerging accelerator technologies

Requirements:

Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field
Six or more years of experience in GPU programming and performance engineering
Deep expertise in CUDA C/C++ and GPU programming models
Strong understanding of modern GPU architectures, memory hierarchies, and execution models
Hands-on experience profiling and optimizing GPU workloads in production
Familiarity with NCCL, MPI, and high-performance interconnect technologies
Experience integrating custom kernels into ML frameworks
Strong C++ skills and familiarity with modern systems programming practices
Solid grounding in linear algebra and numerical methods
Strong communication and collaboration skills with research and engineering teams
Experience with Triton, CUTLASS, or other GPU kernel authoring frameworks
Familiarity with TensorRT, FasterTransformer, or vLLM internals
Exposure to compiler infrastructure such as LLVM or MLIR
Open-source contributions to GPU or ML performance libraries
Experience with large-scale distributed training infrastructure

GPU Software Engineer

Key skills

About this role

Responsibilities:

Requirements: