Summary:
We re seeking a strong AI Engineer to join our Fundamental AI Research (FAIR) team, an organization focused on making research breakthroughs in AI. Join the core team defining the future of multimodal AI. You will build the data engines and evaluation platforms that power our frontier multimodal LLMs, transforming raw data into high-fidelity benchmarks that drive breakthroughs.
You ll ship high-fidelity reproducible breakthroughs and bridge frontier research into our product ecosystem to impact billions. You'll collaborate within an interdisciplinary group of scientists and engineers, leveraging world-class compute, data resources, and research facilities.
What you'll do?
- Design human and model-based datasets for training and evaluating MLLMs (image/video/audio/text data).
- Run frontier model evaluations and implement common metrics.
- Use MLLMs to bootstrap, augment, and rebalance datasets.
- Fine-tune models on labeled datasets.
- Ship annotation task UIs and data-heavy internal tools (React/TypeScript)
- Run secure ingestion pipelines from warehouse/warm-storage/flat files at scale.
Required Qualification:
Must-Have Hard Skills:
- ML/AI fundamentals - fine-tuning (SFT/preference), prompting, model-based eval, and the failure modes of each.
- At least one year of hands-on work in the field. Multimodal dataset experience - building image/video/audio/text datasets to train or evaluate models.
- Strong Python + ML tooling - PyTorch (or equivalent), HF ecosystem, comfort running training/sampling jobs on real infra.
- Web/UI engineering - production React/TypeScript; design, vibe code and ship clean, usable interfaces for annotation and internal tools.
- Data pipelines - SQL plus reliable large-scale data movement: batching, idempotent dedupe, parallelism, retries.
- Experience in mid-sized / big tech
Nice-to-Have Skills:
- MS or PhD in AI or related field
- Research track record in MLLMs - demonstrated through publications or impactful contributions to major projects
- Open-source contributions - experience shipping and maintaining public-facing repositories
Additional Information:
- How will performance be measured? Meeting deadlines, executing tasks with accuracy, etc.
- Required YOE: 5-10+ years of hands-on experience in building multimodal datasets, implementing machine learning evaluations, and developing production-level data and annotation tools.
- Degree/Certifications Required: Academic background in Computer Science, Engineering, or a related quantitative discipline.
- Interview Process: 2 Rounds [Technical + Behavioral]