Boson AI is pioneering the future of enterprise AI by developing cutting-edge AI solutions. The role involves engineering and evolving the core Agent OS, focusing on building high-performance systems that integrate complex agentic orchestration and dialog management functionalities.
Responsibilities:
- System Ownership: Take ownership of the core dialog & policy engine. Define and implement the state machine for agent state representation, the decision-making logic, and the mechanisms for enforcing complex safety policies and guardrails at the execution layer of a workflow
- Distributed Context & Memory: Design, implement, and maintain the high-performance context and memory systems. Focus on low-latency, reliable access to conversational and user history, including the tight integration and optimization of RAG and vector retrieval pipelines for production use
- Agentic Orchestration Frameworks: Define, architect, and deliver robust agentic orchestration patterns, including battle-tested planner–executor schemes, ReAct-style reasoning and acting loops, and resilient, multi-step workflows that programmatically combine tools, LLMs, and stateful memory
- Internal SDK/Framework Development: Build and evolve the internal, production-grade equivalent of frameworks like LangChain/LlamaIndex. Design composable graphs and execution chains with clear APIs and type safety that product engineering teams and low-code builders can safely reuse, extend, and deploy at scale
- Voice Runtime Infrastructure: Own and optimize the voice runtime components for streaming audio, low-latency barge-in detection, and reliable turn-taking protocols. This requires deep collaboration with Application and ML Platform teams to meet tight latency, jitter, and quality of service (QoS) constraints
- Tooling & Integration Architecture: Architect a robust, secure tooling and integration framework (MCP/A2A). This includes building the underlying infrastructure for tool registration, handling complex authentication/authorization, implementing rate limiting/circuit breaking, managing retries, and ensuring typed, validated I/O between agents and external microservices
- Platform Observability & Reliability: Define, instrument, and monitor rigorous SLIs/SLOs for the Agent Platform. Lead engineering efforts to continuously improve reliability, enhance system debuggability (rich, step-level traces and structured logging), and drive core performance optimizations over time
- API & Abstraction Design: Ensure the platform's public-facing APIs and internal abstractions are clear, well-documented, and fundamentally sound, enabling junior and senior engineers alike to compose sophisticated agent behavior without introducing systemic invariants or breaking changes
- Advanced Capabilities R&D: Explore and prototype future capabilities, focusing on the engineering challenges of on-device personalization, implementing privacy-preserving federated learning signals, or integrating novel policy adaptation techniques that influence agent behavior in production