Reddit is a community of communities and is seeking a Staff Research Engineer for Pre-training Science to lead the development of foundational Large Language Models tailored to Reddit's unique culture and language. The role involves defining strategies for Continual Pre-Training and conducting research on multimodality and data curriculum to enhance model performance.
Responsibilities:
- Architect and validate rigorous Continual Pre-Training (CPT) frameworks, focusing on domain adaptation techniques that effectively transfer Reddit’s knowledge into licensed frontier models
- Design the "Science of Multimodality": Lead research into fusing vision and language encoders to process Reddit’s rich media (images, video) alongside conversational text threads
- Formulate data curriculum strategies: scientifically determining the optimal ratio of "Reddit data" vs. "General data" to maximize community understanding while maintaining safety and reasoning capabilities
- Conduct deep-dive research into Scaling Laws for Graph-based data: investigating how Reddit’s tree-structured conversations impact model convergence compared to flat text
- Design and scale continuous evaluation pipelines (the "Reddit Gym") that monitor model reasoning and safety capabilities in real-time, enabling dynamic adjustments to training recipes
- Drive high-stakes architectural decisions regarding compute allocation, distributed training strategies (3D parallelism), and checkpointing mechanisms on AWS Trainium/Nova clusters
- Serve as a force multiplier for the engineering team by setting coding standards, conducting high-level design reviews, and mentoring senior engineers on distributed systems and ML fundamentals
Requirements:
- 7+ years of experience in Machine Learning engineering or research, with a specific focus on LLM Pre-training, Domain Adaptation, or Transfer Learning
- Expert-level proficiency in Python and deep learning frameworks (PyTorch or JAX), with a track record of debugging complex training instabilities at scale
- Deep theoretical understanding of Transformer architectures and Pre-training dynamics—specifically regarding Catastrophic Forgetting and Knowledge Injection
- Experience with Multimodal models (VLM): understanding how to align image/video encoders (e.g., CLIP, SigLIP) with language decoders
- Experience implementing continuous integration/evaluation systems for ML models, measuring generalization and reasoning performance
- Demonstrated ability to communicate complex technical concepts (like loss spikes or convergence issues) to leadership and coordinate efforts across Infrastructure and Data teams
- Published research or open-source contributions in Continual Learning, Curriculum Learning, or Efficient Fine-Tuning (LoRA/Peft)
- Experience with Graph Neural Networks (GNNs) or processing tree-structured data
- Proficiency in low-level optimization (CUDA, Triton) or distributed training frameworks (Megatron-LM, DeepSpeed, FSDP)
- Familiarity with Safety alignment techniques (RLHF/DPO) to understand how pre-training objectives impact downstream safety