About this role

Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. They are seeking a highly skilled GPU Kernel Engineer to design and optimize custom GPU kernels for large-scale AI systems, working across the hardware-software stack to enhance performance and scalability of their ML platform.

Responsibilities:

Design, implement, and optimize custom GPU kernels using C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas
Profile and optimize end-to-end performance of ML operations, with a focus on large-scale LLM training and inference
Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and custom internal runtimes
Develop performance models, identify bottlenecks, and deliver kernel-level improvements that significantly accelerate AI workloads
Collaborate with ML researchers, distributed systems engineers, and model-serving teams to optimize compute performance across the stack
Work closely with hardware vendors (NVIDIA/AMD) and stay current on the latest GPU architecture capabilities and compiler/toolchain improvements
Contribute to tooling, documentation, benchmarking suites, and testing frameworks to ensure correctness and performance reproducibility

GPU Kernel Engineer

Key skills

About this role

Responsibilities: