Peach Pilot is a company that transforms how businesses run by building a platform that connects operations through AI. The Senior Quality Assurance Engineer will be responsible for establishing testing frameworks and ensuring the quality of AI-generated outputs before they reach clients, playing a crucial role in maintaining trust and reliability in the company's deliverables.

Responsibilities:

Establish the testing framework: unit, integration, end-to-end, and AI-specific evaluation pipelines using Playwright and Vitest
Define quality standards, test coverage requirements, and documentation practices in partnership with the Lead Engineer
Audit the existing platform and identify the highest-risk surfaces before the next client deployment
Design evaluation frameworks for non-deterministic LLM outputs — including prompt regression testing, model drift detection, and output quality scoring
Build automated test suites for the agent orchestration layer, including governance-agent audit-trail integrity and human-override behavior
Validate the Company Brain (Memgraph + Qdrant) for data accuracy, retrieval quality, and failure modes under real enterprise data including entity resolution across systems and temporal data patterns
Test the Analysis Engine pipeline that surfaces Company X-Ray findings ensuring insights are not just technically accurate but reliable enough to present to a client
Own end-to-end testing of the data ingestion pipelines that connect to client systems CRM, email, calls, calendars, documents, financial systems through Nango's 700+ connector integration layer
Test multi-model routing logic to confirm cost-optimized task allocation behaves correctly across LLM providers via LiteLLM
Validate streaming response handling, latency thresholds, and graceful degradation when a model is unavailable or slow
Own file ingestion pipeline testing (Word, Excel, PowerPoint, PDF) including encryption, formatting edge cases, and audit-trail continuity

Requirements:

7+ years of QA engineering experience, with at least 3 years in a senior or lead capacity where you shaped process and standards not just executed them
You have tested AI/LLM-powered applications. You understand prompt sensitivity, output variance, and how to build eval pipelines that catch regressions across model updates
You speak in ownership: you've built the eval pipeline, owned model quality, gated the release — not just run someone else's test suite
You write test code. Python is your primary tool. You have built and maintained CI/CD-integrated test suites, and you don't wait for someone to file a bug to find one
Hands-on experience with Playwright and Vitest in a production environment and you've built automation frameworks from scratch, not just inherited them
Comfortable testing complex API chains, async/streaming responses, and multi-service workflows. Data pipelines and knowledge graph outputs don't intimidate you
You test for confusion and trust failure not just broken functionality. Your end users are non-technical executives, and you advocate for them
US-based, able to overlap roughly 5 hours per day with EDT, and available for full-time contract hours
You have experience with LLM evaluation frameworks (e.g., LangSmith, DeepEval, Promptfoo, RAGAS, or custom eval pipelines)
You have tested agent frameworks or orchestration layers in a production environment
You have a background in a regulated industry (insurance, finance, healthcare) where audit-trail integrity is non-negotiable
You have worked alongside Forward Deployed or solutions engineering teams and understand field deployment risk

Senior Quality Assurance Engineer

Key skills

About this role

Responsibilities:

Requirements: