Home
Jobs
Saved
Resumes
Head of AI Evaluation, Reliability Engineering at Codvo.ai | JobVerse
JobVerse
Home
Jobs
Recruiters
Companies
Pricing
Blog
Jobs
/
Head of AI Evaluation, Reliability Engineering
Codvo.ai
Website
LinkedIn
Head of AI Evaluation, Reliability Engineering
Pune, Maharashtra, India
Full Time
1 hour ago
No Sponsorship
Apply Now
Key skills
AI
ML
LLM
RAG
MLOps
CI/CD
Leadership
About this role
Role Overview
Build Codvo’s AI Evaluation & Reliability Engineering function as a core platform/engineering capability.
Define engineering standards for AI evaluation, testing, release gating, and runtime monitoring.
Integrate evaluation/reliability frameworks into Codvo’s engineering and delivery lifecycle.
Design reusable evaluation frameworks for:
LLM / multimodal quality
RAG grounding / evidence fidelity
Agent reasoning / decision quality
Tool / workflow execution success
Safety / policy / compliance adherence
Cost / latency / production economics
Build benchmark packs, golden datasets, and regression suites for priority enterprise workflows.
Define benchmark coverage and versioning standards.
Establish processes for edge-case capture and benchmark expansion.
Design systems/processes for:
Runtime drift / degradation monitoring
Failure mode analysis / incident diagnostics
Human review / escalation pathways
Continuous evaluation and improvement loops
Partner closely with platform, product, and solution engineering teams.
Serve as internal SME on AI reliability, benchmark design, and evaluation methodology.
Help shape architecture standards for AI-native product and workflow delivery.
Build and lead a team of:
Evaluation Engineers
Benchmark / QA Engineers
Reliability / Observability Engineers
Domain Review / Feedback Ops Specialists
Requirements
10+ years in engineering / AI / ML leadership roles.
5+ years building or operating production AI / ML systems.
Proven experience designing or operating:
AI/LLM evaluation frameworks
Benchmark / regression systems
AI QA / testing / validation infrastructure
Production ML / observability / monitoring systems
Reliability engineering / quality engineering organizations
Technical Expertise
LLM / multimodal evaluation methodologies
Benchmark / golden dataset design
Agent / tool-use / workflow evaluation
RAG evaluation / grounding analysis
AI observability / telemetry / tracing
Human-in-the-loop feedback systems
AI safety / governance / policy testing
Release gating / CI/CD / engineering quality systems
Preferred Backgrounds
AI Infrastructure / Evaluation Platforms
AI Observability / MLOps Companies
Enterprise AI Platform Teams
Applied AI Product / Platform Organizations
Reliability / QA Engineering Leadership in Complex Systems
Benefits
Flexible / Hybrid work arrangements
Apply Now
Home
Jobs
Saved
Resumes