Architect Labs is a frontier AI lab for chip design, focused on building AI models and tools for custom ASICs. They are seeking a Senior Member of Technical Staff to own the compiler stack for their SIMD/VLIW NPU, collaborating closely with the NPU architect to optimize both hardware and software components.
Responsibilities:
- Own the compiler end-to-end: graph ingestion (ONNX, PyTorch) through IR optimization, AI-driven code generation, instruction scheduling, and register allocation for a SIMD/VLIW NPU
- Implement and own the memory management layer; for instance SW-managed on-chip scratchpad memory with the compiler handling data tiling, bank allocation, DMA scheduling, and double-buffering across SRAM banks
- Design and iterate on mid-end and backend optimization passes: operator fusion, loop transformations, vectorization, and software pipelining to close the gap between peak and achieved throughput
- Co-design the ISA and instruction encoding with the architect and silicon team. Feed real workload performance data back into architectural decisions
- Support quantization and mixed-precision lowering (32bit single-precision FP or INT, along with lower INT8/4, BF16, FP16/8/4 precisions) with correct numerics end-to-end
- Benchmark compiler output against cycle-accurate models, RTL simulation, and FPGA prototypes. Own QoR tracking
- Grow into a compiler team lead as the team scales