About this role

Cypress HCM is focused on engineering innovative model evaluations and porting external benchmarks to their internal infrastructure. The Evaluation Engineer will be responsible for implementing novel evaluations, performing quality control, and keeping the team updated with new benchmarks.

Responsibilities:

Porting new external benchmarks to the teams internal infrastructure so they can be run as part of their evaluation stack for new model releases
Keeping up to date with new evals and benchmarks, pitching the team on porting newly released evals
Performing rigorous quality control for new and existing evals
Implementing novel evaluations to measure dangerous capabilities and safety of frontier models

Requirements:

Strong Python coding experience and writing clean code fast
Working in a small team on a large, shared codebase
Experience designing and building model evaluations
Detail-oriented, with tenacity to dig through transcripts to identify and resolve issues
Ability to quickly and independently learn new skills and frameworks
Team player with strong communication skills
Demonstrated research experience in the evals space
Experience with agentic evaluations and working with Docker

Evaluation Engineer

Key skills

About this role

Responsibilities:

Requirements: