Great Value Hiring is seeking talented MLOps Engineers with expertise in modern ML frameworks such as JAX and PyTorch. The role involves guiding teams to enhance AI model performance and designing solutions for MLOps challenges.
Responsibilities:
- Guide research and engineering teams to close knowledge gaps and improve AI model performance in MLOps, training infrastructure, and ML framework-level topics
- Design challenging, domain-relevant tasks across multiple specializations, and write accurate and well-structured solutions to MLOps and ML systems problems
- Evaluate MLOps tasks and solutions and provide clear, written technical feedback
- Develop guidelines and detailed rubrics/evaluation frameworks to assess training pipeline design, distributed systems reasoning, and kernel-level optimization across tasks
- Collaborate with other subject matter experts to ensure consistency and accuracy in training data
Requirements:
- 5+ years of dedicated professional experience in ML infrastructure, MLOps, or ML systems engineering at a recognized, top-tier organization
- Hands-on production experience with JAX and/or PyTorch at scale including distributed training strategies (FSDP, tensor parallelism, pipeline parallelism), memory optimization, and framework-level debugging
- Experience writing or optimizing custom GPU kernels using Pallas (JAX) or Triton including tiling strategies, memory layout design, and kernel fusion
- Demonstrable career progression
- Ability to engage reliably for at least 30 hours/week during weekdays
- Strong written communication skills and the ability to explain complex technical decisions clearly