Callosum is the Intelligent Systems company focused on creating infrastructure for heterogeneous intelligence. The role involves building and optimizing inference engines for diverse hardware, ensuring that scheduling, memory management, and execution adapt to the underlying accelerator capabilities.
Responsibilities:
- Contribute upstream to SGLang and vLLM, and maintain internal forks where our requirements diverge
- Improve hardware-awareness within inference engines so that scheduling, memory management, and execution adapt to the capabilities of the underlying accelerator
- Design and implement bespoke parallelism and disaggregation strategies that go beyond default configurations to better exploit heterogeneous hardware
- Work closely with an Accelerator Systems Software engineer to ensure engine-level abstractions map cleanly onto diverse hardware capabilities