About this role

DensityAI is a company focused on AI technology, seeking a Kernel Engineer to write and optimize compute kernels for their custom AI accelerator. The role involves collaborating with architecture and compiler teams to enhance performance and ensure effective hardware utilization.

Responsibilities:

Write and optimize compute kernels for a custom AI accelerator — tensor operations, data movement patterns, memory hierarchy exploitation
Develop and maintain profiling infrastructure to measure kernel performance against architectural targets
Define and document shuffle patterns for ML kernel primitives across CPU-like control, tensor cores, and CUTLASS-style operations
Drive kernel DSL design decisions — thread spawn mechanisms, register passing conventions, and memory management strategies
Enable end-to-end kernel execution on the architectural simulator
Collaborate with the compiler team on the MLIR dialect — your kernels are the primary validation target
Create onboarding documentation and kernel writing guides for the broader team

Kernel Engineer (Compute / Accelerator)

Key skills

About this role

Responsibilities: