Workato is a leader in enterprise orchestration, helping businesses streamline operations through its AI-powered platform. The Staff Product Manager will own the evaluations for AI agents, establishing frameworks and building customer-facing tools to enhance agent performance assessment.

Responsibilities:

Define and own the evaluation framework for Workato's internal AI agent features, driving adoption across teams starting with Agent Studio
Build the customer-facing evaluation experience — how builders test, measure, and improve agents they create on Workato
Make hard calls about what evaluation complexity to expose versus abstract, balancing rigor with approachability
Partner closely with the Build Experience PM to ensure evaluation is integrated into the builder journey, not bolted on
Work with ML engineers and platform teams to ground the framework in technical reality while keeping it accessible
Establish metrics for what 'good' looks like — both for internal agent quality and for customer evaluation adoption
Spend significant time with customers understanding where they struggle to assess agent performance and what mental models they bring

Requirements:

7+ years in Product Management
Hands-on experience writing evaluations for AI/ML systems (agents, LLMs, or similar)
Track record of shipping technical products to both internal and external users
Experience driving adoption of frameworks or practices across engineering teams
Strong written and verbal communication skills
Bachelor's degree or equivalent experience
Practitioner depth in evaluations. You've written evals yourself — built test suites, designed rubrics, debugged why agents underperformed. You understand evaluation methodology not only from reading about it, but from doing it. You have opinions about what works, what doesn't, and where current approaches fall short
Strong product management experience. You've shipped products, driven roadmaps, and led cross-functional teams. You know how to translate technical capabilities into user value and write specs that don't leave details to chance
Technical translation ability. You can take complex evaluation concepts and make them accessible to business technologists without dumbing them down. You understand the difference between hiding complexity and organizing it
Internal influence skills. You've driven adoption of frameworks, practices, or tools across teams. You can be a credible partner to ML engineers while advocating for what internal teams actually need
Greenfield comfort. You've defined products from ambiguity — scoped v1s, made bets with incomplete information, and iterated based on what you learned. You don't need an existing playbook to be effective
B2B product sensibility. You see enterprise conventions as problems to solve, not constraints to accept. You're drawn to products that make complex workflows feel elegant
Experience with agent architectures, RAG systems, or LLM application development
Background in ML engineering, solutions architecture, or technical program management before PM
Experience building developer tools or platform products
Familiarity with evaluation frameworks (e.g., human eval pipelines, automated benchmarks, red-teaming)

Staff Product Manager (Evals)

Key skills

About this role

Responsibilities:

Requirements: