Design and implement production agent architectures: state management, error handling, retry logic, graceful degradation, human-in-the-loop escalation
Build evaluation and testing frameworks for non-deterministic agent workflows: offline tests, synthetic data generation, regression checks, and post-deploy monitoring