Preference Model is building automated ML research engineering, focusing on creating high-quality reinforcement learning environments. The role involves designing and building low-level, kernel-focused RL environments that target specific models and difficulty distributions, requiring both research and engineering skills.
Responsibilities:
- Design and build low level / kernel-focused reinforcement learning (RL) environments that target a specified model and difficulty distribution
- Choose which environments are worth building. A strong kernel environment hits several marks:
- Targets a niche or genuinely hard domain
- Exercises real hardware features (tiling, streaming, async copy, vector ISAs)
- Interesting hardware or simulators (FPGAs, novel accelerators, gem5)
- Research-motivated, grounded in benchmarks where models lag
- Has a recognized reference to measure against (cuBLAS/FFTW/OpenSSL/etc.)
- Scales into many diverse tasks from a single design
- Build correctness and performance scoring that's deterministic and can't be gamed: the objective is clear, and the only way to hit it is to actually write the kernel