San Francisco, California, United States of America
Full Time
1 week ago
$139,764 - $287,749 USD
No Visa Sponsorship
Key skills
PythonSparkSQLAIMachine LearningMLGenerative AILLMLarge Language ModelsCommunicationOWASP
About this role
Role Overview
Design and develop automated adversarial testing methodologies — including single-turn, multi-turn, and multimodal attack strategies — to proactively identify vulnerabilities in Pinterest's Generative AI products.
Build and calibrate hybrid evaluation pipelines combining LLM-based judges, classifiers, and rule-based systems to accurately detect safety violations, policy breaches, bias, and representational harms.
Develop and operationalize harm taxonomies grounded in industry standards and Pinterest's Responsible AI and Trust & Safety threat models.
Design adaptive refinement loops that learn from attack outcomes (near-misses, partial failures) to iteratively surface deeper and previously unknown vulnerabilities.
Bring scientific rigor and statistical methods to the evaluation of AI safety — including benchmark dataset construction, evaluation calibration, and success-metric definition (vulnerability severity, coverage breadth, pre-launch risk reduction).
Work cross-functionally to build relationships, proactively communicate key findings, and collaborate closely with ML engineers, Trust & Safety specialists, policy teams, product managers, and legal partners to ensure safe product launches.
Relentlessly focus on impact, whether through influencing product safety strategy, advancing responsible AI metrics, or improving critical evaluation processes.
Mentor and up-level junior data scientists and cross-functional partners on adversarial evaluation, responsible AI methodologies, and safety-aware data science practices.
Requirements
5+ years of experience analyzing data in a fast-paced, data-driven environment with proven ability to apply scientific methods to solve real-world problems on web-scale data.
Strong interest and hands-on experience in one or more of: AI safety, adversarial machine learning, red teaming, responsible AI, or trust & safety.
Deep familiarity with large language models (LLMs), generative AI systems, and their failure modes — including prompt injection, jailbreaks, bias, and safety violations.
Experience designing and calibrating evaluation frameworks for AI systems — including LLM-as-judge, classifier-based evaluation, and benchmark dataset construction.
Strong quantitative programming (Python) and data manipulation skills (SQL/Spark); experience with ML pipelines and large-scale experimentation.
Familiarity with AI safety taxonomies and frameworks (e.g., OWASP LLM Top 10, MITRE ATLAS) is strongly preferred.
Ability to work independently, drive ambiguous projects end-to-end, and operate with high ownership.
Excellent written and verbal communication skills, with the ability to explain complex technical findings to both technical and non-technical partners.
A team player eager to partner across Responsible AI, Trust & Safety, Product, Engineering, Policy, and Legal to turn safety insights into action.
Tech Stack
Python
Spark
SQL
Benefits
Information regarding the culture at Pinterest and benefits available for this position can be found here.