Recruits Lab is a well-funded, advanced AI research company focused on building next-generation foundation models. They are seeking a Senior Machine Learning Engineer to join their core model engineering team, responsible for building, scaling, and optimizing large language models and leading engineering efforts in distributed training and performance optimization.

Responsibilities:

Lead end-to-end engineering of large language models (10B–100B+ parameters)
Implement large-scale pre-training, SFT, and alignment pipelines
Optimize model architectures and training strategies based on scaling laws and product objectives
Drive measurable improvements in performance, reasoning capability, and training efficiency
Architect and optimize multi-node GPU distributed training systems (A100 / H100 / B200 environments)
Implement advanced parallel strategies: Data, Tensor, Pipeline, and Sequence Parallelism
Maximize Model FLOPs Utilization (MFU) and overall cluster efficiency
Improve training stability, fault tolerance, and monitoring
Build and maintain TB–PB scale data pipelines
Implement ingestion, cleaning, deduplication (MinHash/LSH), safety filtering, and PII removal
Support multimodal data strategies, synthetic data generation, and curriculum learning
Productionize alignment techniques (RLHF, DPO, KTO)
Work with Mixture-of-Experts (MoE) architectures and routing optimization
Improve model reasoning, math, and coding performance
Build and enhance agent and tool-calling systems
Uphold strong coding and system design standards
Identify and eliminate performance bottlenecks
Take ownership of major system components end-to-end

Requirements:

MS/PhD in Computer Science, AI, Mathematics, or equivalent practical experience
Strong hands-on experience in engineering and optimizing large-scale deep learning systems
Deep understanding of Transformer architectures (RoPE, FlashAttention, SwiGLU)
Experience working with modern open-source or proprietary LLMs
Advanced proficiency in PyTorch or JAX
Experience with Megatron-LM, DeepSpeed, FSDP, or equivalent frameworks
Strong understanding of 3D parallelism and ZeRO optimization strategies
Hands-on experience training on large GPU clusters (100+ GPUs preferred)
Familiarity with InfiniBand, RDMA, and storage I/O optimization
Experience debugging large distributed training runs
Highly self-driven and execution-focused
Strong system ownership mindset
Comfortable operating in fast-moving R&D environments
Open-source contributions in the LLM ecosystem
Experience building agentic systems or multi-step reasoning frameworks
CUDA or Triton kernel optimization experience
Published research or major production LLM deployments

Senior Machine Learning Engineer

Key skills

About this role

Responsibilities:

Requirements: