Immediate joiners required !! 2 days WFO - Hinjewadi Phase 3
Who are we
Fulcrum Digital is an agile and next-generation
digital accelerating company providing digital transformation and technology
services right from ideation to implementation. These services have
applicability across a variety of industries, including banking & financial
services, insurance, retail, higher education, food, healthcare, and
manufacturing.
Key Responsibilities
-
Design and execute test cases for
LLM agents, RAG pipelines,
agentic workflows
, and AI-assisted decision tools
-
Validate AI outputs against
ground truth
using
structured accuracy scoring (NAICS, risk exposure flags, hazard group
mapping)
-
Detect
hallucinations, reasoning gaps, source fabrication,
and misattributions
in model-generated content
-
Run
multi-model comparative testing
across GPT, Claude,
Gemini, and Perplexity — evaluating accuracy, latency, and output
completeness
-
Test
prompt versions iteratively
and track accuracy
changes across prompt cycles
-
Validate
citation accuracy, document ingestion pipelines
,
and cross-document context handling
-
Design
edge case and negative tests
for AI-specific
failure modes — content filter triggers, tool call limits, missing
documents, and incomplete synthesis
-
Perform
regression testing
after model upgrades, prompt
changes, and backend fixes, and maintain structured QA sign-off in JIRA
What Makes This Role Different from
Traditional QA
-
You evaluate
whether an AI is reasoning correctly
— not
just whether the UI behaves as expected
-
You build
evaluation rubrics for non-deterministic outputs
and apply
LLM-as-a-Judge techniques
to score quality at scale
-
You treat every
model or prompt change as a potential
accuracy regression
, not just a functional one
-
You understand that in live AI systems,
a passing test today
does not guarantee a passing test tomorrow
Required Skillsets
-
7–8 years of QA experience with
minimum 2 years in
Generative AI / LLM-based projects
-
Hands-on experience testing
chatbots, RAG systems, or
agentic AI pipelines
-
Proven ability to perform
ground truth validation
and
detect hallucinations and reasoning failures
-
Familiarity with
multi-model evaluation
, prompt-aware
testing, and JIRA-based defect reporting
Preferred Skillsets
-
Background in
insurance or regulated industries
;
exposure to underwriting or risk classification concepts
-
Familiarity with
Azure OpenAI, AWS Bedrock
, or
SharePoint-integrated AI environments
-
Knowledge of
AI governance, content filtering, and PII
redaction
validation