Associate Director, Generative AI Evaluation, Standards
Barcelona, Catalonia, Spain
Full Time
1 week ago
Visa Sponsorship
Key skills
RAIMLGenerative AIGenAILLMLarge Language ModelsRAGAgenticMLOpsLeadershipCommunication
About this role
Role Overview
Design and maintain evaluation frameworks for GenAI solutions spanning LLM quality, RAG performance, agent reliability, safety, and scientific accuracy.
Develop therapeutic-area-specific evaluation criteria with business teams, reflecting domain-specific quality requirements per therapeutic area and application type.
Define benchmarks for scientific validity, regulatory compliance, data quality, and operational reliability.
Establish gold-standard validation datasets and automated evaluation pipelines.
Serve as technical lead for the Evaluation & Standards Board, a cross-functional governance body for GenAI quality.
Present evaluation findings and recommendations to the GenAI Portfolio Steering Committee.
Define what “production-ready” means for GenAI in a regulated R&D environment and enforce that standard.
Set evaluation gates for product development stages: PoC, Limited Release, Scaled Product, and Product Ops.
Partner with QMS/MLOps on the handoff from pre-adoption evaluation to production quality monitoring.
Work with therapeutic area teams on domain-specific evaluation standards.
Collaborate with Data Strategy & Products on data quality as an input to the evaluation rubric.
Engage JJIT architecture teams on platform-layer evaluation and security assessments.
Contribute to the weekly GenAI Outlook newsletter, specifically the evaluation implications of frontier AI developments.
Build internal evaluation capability through training, documentation, and tooling for teams conducting assessments.
Publish evaluation standards and rubrics as reference material for the broader organization.
Build and lead the GenAI Evaluation & Standards team and establish a culture of scientific rigor and independent judgment within the team.
Requirements
Advanced degree (PhD strongly preferred) in computational biology, bioinformatics, data science, computer science, AI/ML, biomedical engineering, applied mathematics, or related discipline.
Minimum 8 years of post-academic industry experience, with significant time in pharmaceutical or biotechnology R&D.
Hands-on expertise with generative AI systems: large language models, retrieval-augmented generation, agentic frameworks, and prompt engineering.
Demonstrated track record designing evaluation frameworks, scientific benchmarks, or quality standards for AI/ML systems.
Strong people leadership experience, including building and managing technical teams in a matrixed organization.
Excellent communication skills: ability to present complex technical findings clearly to non-technical senior stakeholders and defend unpopular conclusions.
Strategic thinking capability: ability to connect technical evaluation to business impact and organizational decision-making.