Design and implement multi-agent and digital worker orchestration patterns that enable specialized agents to delegate, collaborate, and complete multi-step business goals.
Build stateful and cyclic workflows using frameworks such as LangGraph, CrewAI, AutoGen, or similar, enabling reflection, recovery, and adaptive execution beyond simple linear chains.
Develop reusable orchestration components for routing, retries, fallback logic, structured outputs, and human-in-the-loop interventions.
Define how digital workers compose and invoke reusable skills across common enterprise workflows.
Build and maintain reusable skills that encapsulate business actions, domain logic, tool usage, and workflow steps in a standardized way.
Implement resilience patterns for non-deterministic AI systems, including timeout handling, intelligent retries, degraded execution modes, and escalation paths.
Design systems for long-running, resumable workflows for agents and digital workers, including checkpointing, persistence, context restoration, and lifecycle management.
Build automated evaluation frameworks to measure workflow quality, skill execution quality, tool-use accuracy, groundedness, safety, and task success.
Drive reusable engineering standards, shared libraries, and reference patterns for agent development, digital workers, and skills across the platform.
Requirements
8+ years of software engineering experience with strong proficiency in Python and backend/platform engineering.
Hands-on experience building LLM-powered systems, agents, digital workers, or workflow automation platforms in production.
Experience with frameworks such as LangGraph, CrewAI, AutoGen, LangChain, LlamaIndex, or similar.
Strong experience in APIs, distributed systems, cloud-native engineering, and production reliability.
Experience designing and integrating RAG pipelines, tool-calling systems, reusable skills, and structured output patterns.
Experience with at least one major cloud platform such as AWS, Azure, or GCP, along with Docker, Kubernetes, and CI/CD practices.
Ability to design systems with strong trade-off awareness across quality, latency, cost, resilience, and maintainability.