Mendable is a company focused on enhancing data extraction from the web, and they are seeking a Research Engineer specializing in Reinforcement Learning. The role involves building training infrastructure, fine-tuning models, and bridging classical RL approaches with modern LLM systems to improve web data processing capabilities.

Responsibilities:

Build training infrastructure and reward pipelines from scratch
Design and operate the systems that train and evaluate Firecrawl's models
You'll own the full loop — data collection, reward modeling, training runs, evaluation, and deployment
Fine-tune models to achieve state-of-the-art results
Take foundation models and make them dramatically better at web data extraction, content understanding, and structured output generation
Bridge LLM agents and classical RL
Design reward signals for agent behaviors, apply RL methods to improve multi-step agent workflows
Run fast experiments and iterate
Design experiments that test meaningful hypotheses, run them quickly, and make decisions based on results
Communicate clearly to non-RL people
Translate your work into language that engineers, product people, and leadership can understand and act on
Collaborate closely with the team
Work directly with the Search/IR-focused Research Engineer and the engineering team to connect RL improvements with search, ranking, and the broader product roadmap

Research Engineer — Reinforcement Learning

Key skills

About this role

Responsibilities: