Callosum is the Intelligent Systems company focused on creating infrastructure for heterogeneous intelligence. The role involves building and optimizing inference engines for diverse hardware, ensuring that scheduling, memory management, and execution adapt to the underlying accelerator capabilities.

Responsibilities:

Contribute upstream to SGLang and vLLM, and maintain internal forks where our requirements diverge
Improve hardware-awareness within inference engines so that scheduling, memory management, and execution adapt to the capabilities of the underlying accelerator
Design and implement bespoke parallelism and disaggregation strategies that go beyond default configurations to better exploit heterogeneous hardware
Work closely with an Accelerator Systems Software engineer to ensure engine-level abstractions map cleanly onto diverse hardware capabilities

Inference Engine Development - Member of Technical Staff

Key skills

About this role

Responsibilities: