Composio is building infrastructure that facilitates communication between agents and various work tools. They are seeking a Research Engineer to develop large evaluations, work on search problems, and improve session accuracy using real tool call data.
Responsibilities:
- Build large evals with real tool calling data, measuring where models suck in long horizon tool execution
- Work on search problems for finding semantically similar tools and cached tool execution paths and plans
- Train large agentic harness systems to improve session accuracy with millions of real tool calls as baseline data
- SFT on our agentic traces and RL models on top of our agentic harness and app sandboxes