Harvey is transforming how legal and professional services operate by leveraging AI and deep domain expertise. The Senior Product Operations Manager will build and scale the evaluation engine behind Harvey’s platform, ensuring reliable and accurate model behavior while operationalizing evaluation methodologies within the product development lifecycle.
Responsibilities:
- Build and scale the systems that power model and product evaluations across Harvey
- Embed evaluation workflows and readiness checkpoints into the product development lifecycle
- Create the single source of truth for evaluation status, results, history, and launch readiness
- Turn Expert-designed evaluation methodologies into scalable, repeatable operational processes
- Manage relationships with human data vendors and ensure evaluation quality meets legal standards
- Work with Engineering and Research to improve evaluation tooling, automation, and dashboards
- Drive evaluation readiness for major product and model launches across geographies and jurisdictions
- Document and operationalize evaluation governance as complexity increases
- Help define how Harvey ensures model accuracy, reliability, and trust at global scale
Requirements:
- 4–7+ years in technical program management, product operations, research operations, or evaluation/benchmarking roles
- Experience working with ML/AI evaluations, benchmarking frameworks, or scientific workflows
- Comfort with statistical methodologies and SQL or Python, or similar tools to interpret evaluation data
- Ability to work deeply with legal experts and operationalize complex evaluation methodologies
- Strong cross-functional coordination skills across Product, Engineering, Research, and data providers/vendors
- High attention to detail and a bias toward clarity, rigor, and reproducibility
- Ability to navigate extreme ambiguity and bring order to complex systems
- Strong communication skills and comfort translating technical nuance for diverse stakeholders
- Desire to do whatever it takes to make evaluation systems successful—from writing documentation to diagnosing pipeline issues
- Experience in legal tech or working with domain experts in regulated industries
- Experience managing human data providers or human-in-the-loop evaluation pipelines
- Background in ML research, data quality management, or evaluation science
- Early employee at a hyper-growth startup
- Experience at world-class product or platform operations orgs (ex: Stripe, Ramp)