Delos-data is a stealth-mode startup focused on building foundational technology for large-scale AI data center clusters. They are seeking a System Software Engineer to design and implement communication and execution primitives for efficient AI model execution across GPUs.
Responsibilities:
- Collaborate across the stack to influence the design of our foundational technology, ensuring it meets the needs of next-generation AI models
- Identify and resolve performance bottlenecks in distributed training and inference workloads through deep-dive analysis of the software-hardware interface
- Conduct rigorous performance benchmarking and characterization on multi-node clusters