Mendable is a company focused on enhancing data extraction from the web, and they are seeking a Research Engineer specializing in Reinforcement Learning. The role involves building training infrastructure, fine-tuning models, and bridging classical RL approaches with modern LLM systems to improve web data processing capabilities.
Responsibilities:
- Build training infrastructure and reward pipelines from scratch
- Design and operate the systems that train and evaluate Firecrawl's models
- You'll own the full loop — data collection, reward modeling, training runs, evaluation, and deployment
- Fine-tune models to achieve state-of-the-art results
- Take foundation models and make them dramatically better at web data extraction, content understanding, and structured output generation
- Bridge LLM agents and classical RL
- Design reward signals for agent behaviors, apply RL methods to improve multi-step agent workflows
- Run fast experiments and iterate
- Design experiments that test meaningful hypotheses, run them quickly, and make decisions based on results
- Communicate clearly to non-RL people
- Translate your work into language that engineers, product people, and leadership can understand and act on
- Collaborate closely with the team
- Work directly with the Search/IR-focused Research Engineer and the engineering team to connect RL improvements with search, ranking, and the broader product roadmap