Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. They are seeking a highly skilled GPU Kernel Engineer to design and optimize custom GPU kernels for large-scale AI systems, working across the hardware-software stack to enhance performance and scalability of their ML platform.
Responsibilities:
- Design, implement, and optimize custom GPU kernels using C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas
- Profile and optimize end-to-end performance of ML operations, with a focus on large-scale LLM training and inference
- Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and custom internal runtimes
- Develop performance models, identify bottlenecks, and deliver kernel-level improvements that significantly accelerate AI workloads
- Collaborate with ML researchers, distributed systems engineers, and model-serving teams to optimize compute performance across the stack
- Work closely with hardware vendors (NVIDIA/AMD) and stay current on the latest GPU architecture capabilities and compiler/toolchain improvements
- Contribute to tooling, documentation, benchmarking suites, and testing frameworks to ensure correctness and performance reproducibility