Momentive Software is a company dedicated to amplifying the impact of purpose-driven organizations worldwide. They are seeking an AI Quality Engineer to design evaluation frameworks for AI systems, build automated test pipelines, and collaborate with teams to ensure the quality and reliability of AI features.

Responsibilities:

Design and implement evaluation frameworks (evals) to assess LLM and agentic AI system quality, including accuracy, consistency, safety, and task completion rates
Build and maintain automated test pipelines for AI features, covering unit, integration, and end-to-end scenarios across agentic workflows
Develop tooling to detect regressions in model behavior, prompt outputs, and agent decision-making across releases
Define and track quality metrics for AI systems (e.g., hallucination rates, tool-use accuracy, latency, failure recovery) and surface findings clearly to stakeholders
Collaborate with engineers and product managers to identify edge cases, adversarial inputs, and failure modes specific to multi-step agentic pipelines
Contribute to prompt evaluation strategies, including red-teaming, adversarial testing, and bias/fairness assessments
Participate in design and code reviews with a quality-focused lens, raising concerns about testability and reliability early
Help define and document quality standards and best practices for AI/ML features across the team
Other duties as assigned

Requirements:

Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
3–5 years of professional software engineering or quality engineering experience
Hands-on experience working with LLMs or agentic AI systems (e.g., GPT-4, Claude, Gemini, or open-source models)
Proficiency in Python for scripting, test automation, and data analysis
Experience designing and running evaluations (evals) for generative AI or LLM-powered features
Solid understanding of software testing principles: unit, integration, regression, and end-to-end testing
Familiarity with agentic frameworks and concepts (e.g., tool use, multi-step reasoning, retrieval-augmented generation, memory)
Experience with CI/CD pipelines and integrating automated tests into development workflows
Strong analytical skills — able to interpret probabilistic outputs and distinguish meaningful regressions from expected variance
Strong written and verbal communication skills; ability to clearly document findings and present quality data to non-technical stakeholders
Detail-oriented, with a structured approach to exploring edge cases and failure scenarios
Ability to work in a fast-paced environment and manage multiple priorities effectively
Experience with prompt engineering and systematic prompt evaluation methodologies
Familiarity with AI safety, alignment, or responsible AI concepts (e.g., hallucination mitigation, bias detection, guardrails)
Exposure to agentic orchestration frameworks (e.g., LangChain, LangGraph, AutoGen, CrewAI, or similar)
Experience with vector databases or RAG pipelines (e.g., Pinecone, Weaviate, pgvector)
Knowledge of observability and monitoring tools for AI systems (e.g., LangSmith, Weights & Biases, Arize)
Background in data science or ML experimentation practices
Experience with version control systems (Git) and defect-tracking tools (e.g., Jira)
Exposure to cloud platforms (e.g., AWS, Azure, GCP) in the context of deploying or testing AI services

AI Quality Engineer

Key skills

About this role

Responsibilities:

Requirements: