ServiceNow, through its acquisition of Moveworks, is at the forefront of transforming workplace efficiency with AI technology. The role focuses on building the infrastructure that powers Moveworks' AI agents, requiring expertise in distributed systems engineering to manage and orchestrate complex workflows in real-time.
Responsibilities:
- Agent orchestration engine — A state machine that manages long-running agent sessions, coordinating planning, execution, and user interaction across multiple LLM calls and tool invocations
- Distributed session management — Lease-based ownership using DynamoDB conditional writes, heartbeat protocols, and crash recovery via checkpointing
- Event-driven message pipeline — SQS FIFO queues for ordered delivery, Kafka consumers for event processing, and real-time streaming via gRPC and Socket.IO
- Structured concurrency — Python asyncio TaskGroups running multiple concurrent tasks per session (message polling, lease heartbeats, output publishing, orchestrator execution) with fail-fast semantics and graceful cancellation
- Observability infrastructure — OpenTelemetry instrumentation, distributed trace context propagation across async boundaries, custom span lifecycle management for sessions that span minutes
- Caching and state layers — Redis, DynamoDB KV stores with per-org/per-bot scoping, batch read optimization, and hot-reload configuration