DensityAI is a company focused on AI technology, seeking a Kernel Engineer to write and optimize compute kernels for their custom AI accelerator. The role involves collaborating with architecture and compiler teams to enhance performance and ensure effective hardware utilization.
Responsibilities:
- Write and optimize compute kernels for a custom AI accelerator — tensor operations, data movement patterns, memory hierarchy exploitation
- Develop and maintain profiling infrastructure to measure kernel performance against architectural targets
- Define and document shuffle patterns for ML kernel primitives across CPU-like control, tensor cores, and CUTLASS-style operations
- Drive kernel DSL design decisions — thread spawn mechanisms, register passing conventions, and memory management strategies
- Enable end-to-end kernel execution on the architectural simulator
- Collaborate with the compiler team on the MLIR dialect — your kernels are the primary validation target
- Create onboarding documentation and kernel writing guides for the broader team