Hark is an artificial intelligence company focused on developing advanced, personalized intelligence that interacts with the world through various modalities. The Member of Technical Staff, Post-training will lead the development of strategies for coding agents, utilizing reinforcement learning and simulation to enhance model capabilities at scale.
Responsibilities:
- Design and implement post-training strategies, primarily RL-based, to develop strong coding agents capable of multi-step reasoning, tool use, and long-horizon task completion
- Build and scale simulation and scaffolding environments for agentic RL: code execution sandboxes, computer use environments, tool-calling harnesses, and verifiable reward signals
- Develop reward modeling pipelines — including outcome-based, execution-based, and process-based reward signals — and iterate on them based on training dynamics
- Scale synthetic data generation and trajectory distillation pipelines that feed RL training and improve sample efficiency
- Design and run rigorous ablations to understand how algorithm choice, data mixture, reward shaping, and scale interact in the agentic setting
- Build evaluation frameworks grounded in real agent tasks — code correctness, execution success, multi-step tool use — to measure progress and guide iteration
- Collaborate with mid-training, infrastructure, and product teams to translate research insights into durable improvements on the model