QualGent is seeking an AI Reliability QA Engineer to harden their autonomous AI systems for enterprise production environments. The role focuses on engineering reliability into AI-driven workflows, ensuring stability, reproducibility, and trust at scale.
Responsibilities:
- Deterministic AI Execution
- Identify and eliminate flakiness in AI-generated workflows
- Improve reproducibility across CI, staging, and production environments
- Design validation layers and guardrails for AI agent behavior
- Reduce regression escapes through structured reliability metrics
- Evaluate AI-generated test cases for correctness and coverage gaps
- Design stress-testing frameworks for AI workflows
- Improve system resilience under concurrency and load
- Define SLAs and reliability standards for autonomous execution
- Instrument execution traces across AI decision paths
- Build monitoring dashboards for reliability metrics
- Reduce time-to-diagnosis for complex failures
- Lead incident reviews focused on systemic improvements
Requirements:
- 2–10+ years of experience in QA, SDET, automation engineering, or reliability engineering
- Proven experience reducing flakiness in CI/CD pipelines
- Strong debugging capabilities across frontend, backend, and infrastructure layers
- Experience supporting and improving production releases
- Systems-level thinking with a focus on failure modes and edge cases
- Mobile testing expertise (iOS/Android, emulators, device farms)
- Experience with distributed systems
- Observability tooling experience (Datadog, Prometheus, OpenTelemetry, Sentry, etc.)
- Cloud infrastructure experience (AWS, GCP)
- Exposure to LLM-based or agent-driven systems