Cerebras Systems builds the world's largest AI chip, providing unparalleled AI compute power. The role involves working with the inference model team to validate and accelerate new model ideas on wafer-scale hardware, along with prototyping architectural tweaks and building performance-eval pipelines.
Responsibilities:
- Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge
- Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests
- Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software/hardware innovation
- Keep pace with the latest open- and closed-source models; run them first on wafer scale to expose new optimization opportunities