Ellipsis Health is a health technology company focused on delivering reliable conversational AI solutions. The Senior QA Engineer will ensure the quality and stability of the AI product through automated testing and root cause analysis, while collaborating with engineering teams and managing client workflows.
Responsibilities:
- Workflow Mapping & Test Case Generation: Deeply analyze assigned client workflows to design robust, comprehensive positive and negative test cases that safeguard system stability
- AI-Driven Test Automation: Build and execute automated test scenarios by configuring shadow agents
- Prompt Evaluation & Optimization: Apply a strong understanding of prompt awareness to draft, refine, and evaluate prompts used within the testing framework to accurately simulate user behaviors and edge cases
- End-to-End Testing Execution: Strategically deploy specific testing methodologies including Sanity, Smoke, Regression, and Functional testing - determining the exact environment (staging, pre-production, production) and timing for each execution
- Deployment Cadence & Cross-Functional Collaboration: Partner closely with engineering teams during release cycles to proactively identify, triage, and unblock technical roadblocks, ensuring the product is continuously deployment-ready
- Daily LLM Defect RCA: Perform rigorous, daily root cause analysis on LLM-specific failures inherent to generative AI, including hallucinations, high latency, and logic deviations
- Live Production Call Debugging: Investigate live customer calls and production incidents in real time to unblock critical production use cases
- Audio & Transcription Validation: Query and analyze historical call transcripts, system behaviors, and audio data pipelines to pinpoint where a conversational workflow broke down.Speech-to-Speech (S2S) Pipeline Monitoring: Monitor and evaluate the end-to-end voice AI pipeline. This involves analyzing Automatic Speech Recognition (ASR) accuracy, managing audio-to-text latency issues, and understanding general Speech-to-Speech mechanics alongside the stability of the core Knowledge Base feeding the AI
- Advanced Evaluation Frameworks: Maintain a strong conceptual understanding of advanced LLM evaluation paradigms and tools such as LLM-as-a-judge - to remain aware of how AI response quality and accuracy are programmatically graded at scale
- Telephony & Call Flow Awareness: Possess a foundational understanding of real-world call management and telephony routing concepts, including how the system is expected to navigate warm transfers, blind transfers, and voicemail detection workflows