Design and implement end-to-end training pipelines to build and improve editing quality for both images and videos.
Lead core development for specific mid-training areas.
Identify quality gaps in generative models and propose targeted mid-training solutions.
Develop scalable workflows for data curation, data quality improvements, and distributed training.
Increase engineering velocity and reduce iteration cost by systematizing mid-training experimentation and deployment.
Partner closely with research, data, evaluation, infrastructure, pre-training and post-training teams.
Requirements
Master’s degree or Ph.D. in Computer Science, Machine Learning, or a related field preferred.
Proven track record in mid-training or continual pre-training of large-scale multimodal models, specifically on cross modality for image and video data.
Deep understanding of pre-training, mid-training and/or post training for multimodal generative models.
Deep understanding of modern diffusion-based architectures (DiT) and conditional generation and editing.
Strong expertise in Vision-Language Models (VLMs), including experience with contrastive learning and multimodal alignment.
Ability to design and implement scalable pipelines for data curation, data quality control, and distributed training.
Experience optimizing model inference and deployment for high-throughput product environments.