Reflection AI is on a mission to build open superintelligence and make it accessible to all. They are seeking a Safety Lead to own the adversarial evaluation pipeline for their models, ensuring safety and reliability in deployment through collaboration with the Alignment team and the development of automated safety benchmarks.

Responsibilities:

Own the red-teaming and adversarial evaluation pipeline for Reflection’s models, continuously probing for failure modes across security, misuse, and alignment gaps
Work hand-in-hand with the Alignment team to translate safety findings into concrete guardrails, ensuring models behave reliably under stress and adhere to deployment policies
Validate that every release meets the lab’s risk thresholds before it ships, serving as a critical gatekeeper for our open weight releases
Develop scalable, automated safety benchmarks that evolve alongside our model capabilities, moving beyond static datasets to dynamic adversarial testing
Research and implement state-of-the-art jailbreaking techniques and defenses to stay ahead of potential vulnerabilities in the wild

Member of Technical Staff - Safety Lead

Key skills

About this role

Responsibilities: