Anthropic is a public benefit corporation dedicated to creating reliable and beneficial AI systems. The Environment Scaling team is seeking a Research Engineer to improve public models through the development of RL environments, focusing on fine-tuning strategies, managing vendor relationships, and collaborating with domain experts.
Responsibilities:
- Improve and execute our fine-tuning strategies for adapting Claude to new domains and tasks
- Manage technical relationships with external data vendors, including evaluation of data quality and reward design
- Collaborate with domain experts to design data pipelines and evaluations
- Explore novel ways of creating RL environments for high value tasks
- Develop and improve QA frameworks to catch reward hacking and ensure environment quality
- Partner with other RL research teams and product teams to translate capability goals into training environments and evals
Requirements:
- At least a Bachelor's degree in a related field or equivalent experience
- Experience with fine-tuning large language models for specific domains or real-world use cases
- Experience with reinforcement learning, reward design, or training data curation for LLMs
- Comfortable managing technical vendor relationships and iterating quickly on feedback
- Strong project management and interpersonal skills
- Passionate about making AI more useful and accessible across different industries
- Excited about a role that includes a combination of ML research, data operations, and project management
- Experience training production ML systems
- Familiar with distributed systems and cloud infrastructure
- Domain expertise in an area where we would like to make our models more useful
- Experience working with external vendors or technical partners