Call For Referral is focused on advancing next-generation AI systems, and they are seeking a Machine Learning Ops Engineer to support AI research and engineering teams. The role involves improving ML infrastructure, designing advanced MLOps tasks, and contributing to large-scale model training performance.
Responsibilities:
- Support AI research and engineering teams in improving ML infrastructure and training systems
- Design advanced MLOps and ML systems tasks with accurate, structured technical solutions
- Evaluate ML systems outputs and provide detailed technical feedback
- Develop evaluation rubrics and frameworks for distributed systems, training pipelines, and kernel-level optimization
- Collaborate with domain experts to maintain consistency and quality across AI training workflows
- Contribute to improvements in large-scale model training performance and infrastructure reliability
Requirements:
- 2+ years of professional experience in ML infrastructure, MLOps, or ML systems engineering
- Hands-on production experience with JAX and/or PyTorch at scale
- Experience writing or optimizing GPU kernels using Pallas or Triton
- Strong understanding of ML training systems and distributed infrastructure
- Demonstrated career progression in engineering or AI infrastructure roles
- Ability to commit to a full-time 40-hour/week weekday schedule
- Strong written communication and technical documentation skills