Reddit is a community-driven platform known for its open and authentic conversations. The Senior Research Engineer for Post-training & Evaluation will focus on architecting evaluation suites and fine-tuning pipelines to assess and enhance the performance of Reddit's foundational Large Language Models (LLMs). This role involves collaborating with safety engineering and developing automated evaluation systems.

Responsibilities:

Architect and maintain the "Reddit Benchmark" evaluation suite: A comprehensive harness that rigorously tests model capabilities across Safety, Reasoning, and Reddit-specific knowledge (slang, norms)
Build scalable SFT (Supervised Fine-Tuning) pipelines: Implement efficient, distributed training loops for instruction tuning, converting raw base models into helpful assistants
Develop Model-as-a-Judge systems: Engineer automated evaluation pipelines using strong models (e.g., GPT-5, Nova, Claude) to grade the outputs of our internal models, enabling rapid iteration cycles
Execute Synthetic Data generation strategies: Create and curate high-quality instruction sets to improve model generalization where human data is scarce
Collaborate with Safety Engineering: Translate high-level safety policies into concrete evaluation metrics and unit tests that run in our CI/CD pipelines
Debug post-training instability: Dive deep into loss curves and evaluation logs to identify when fine-tuning is causing alignment tax or capability degradation

Requirements:

4+ years of professional experience in machine learning engineering, with a focus on LLM fine-tuning or evaluation
Fluency in Python and PyTorch, with experience using libraries like Hugging Face Transformers, vLLM, or lm-eval-harness
Deep understanding of Instruction Tuning (SFT) and how data quality impacts model behavior
Experience building Evaluation Pipelines: You know the difference between MMLU, GSM8K, and how to build a custom domain-specific benchmark
Familiarity with distributed training (FSDP/DeepSpeed) for fine-tuning jobs
Strong data engineering skills for curating and cleaning instruction datasets
Experience with MLFlow, Weights & Biases, or other experiment tracking tools
Experience with Synthetic Data generation (e.g., Self-Instruct papers)

Senior Research Engineer, Post-training & Evaluation

Key skills

About this role

Responsibilities:

Requirements: