Immediate joiners required !! 2 days WFO - Hinjewadi Phase 3

Who are we

Fulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, healthcare, and manufacturing.

Key Responsibilities

Design and execute test cases for LLM agents, RAG pipelines, agentic workflows , and AI-assisted decision tools
Validate AI outputs against ground truth using structured accuracy scoring (NAICS, risk exposure flags, hazard group mapping)
Detect hallucinations, reasoning gaps, source fabrication, and misattributions in model-generated content
Run multi-model comparative testing across GPT, Claude, Gemini, and Perplexity — evaluating accuracy, latency, and output completeness
Test prompt versions iteratively and track accuracy changes across prompt cycles
Validate citation accuracy, document ingestion pipelines , and cross-document context handling
Design edge case and negative tests for AI-specific failure modes — content filter triggers, tool call limits, missing documents, and incomplete synthesis
Perform regression testing after model upgrades, prompt changes, and backend fixes, and maintain structured QA sign-off in JIRA

What Makes This Role Different from Traditional QA

You evaluate whether an AI is reasoning correctly — not just whether the UI behaves as expected
You build evaluation rubrics for non-deterministic outputs and apply LLM-as-a-Judge techniques to score quality at scale
You treat every model or prompt change as a potential accuracy regression , not just a functional one
You understand that in live AI systems, a passing test today does not guarantee a passing test tomorrow

Required Skillsets

7–8 years of QA experience with minimum 2 years in Generative AI / LLM-based projects
Hands-on experience testing chatbots, RAG systems, or agentic AI pipelines
Proven ability to perform ground truth validation and detect hallucinations and reasoning failures
Familiarity with multi-model evaluation , prompt-aware testing, and JIRA-based defect reporting

Preferred Skillsets

Background in insurance or regulated industries ; exposure to underwriting or risk classification concepts
Familiarity with Azure OpenAI, AWS Bedrock , or SharePoint-integrated AI environments
Knowledge of AI governance, content filtering, and PII redaction validation

AI QA Engineer

Key skills

About this role