Cerebras Systems builds the world's largest AI chip, providing unparalleled AI compute power. The role involves working with the inference model team to validate and accelerate new model ideas on wafer-scale hardware, along with prototyping architectural tweaks and building performance-eval pipelines.

Responsibilities:

Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge
Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests
Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software/hardware innovation
Keep pace with the latest open- and closed-source models; run them first on wafer scale to expose new optimization opportunities

LLM Inference Performance & Evals Engineer

Key skills

About this role

Responsibilities:

LLM Inference Performance & Evals Engineer

Key skills

About this role

Responsibilities:

LLM Inference Performance & Evals Engineer