Microsoft AI is seeking Data Research Engineers to join their Multimodal team, focused on developing foundation models across various data types. The role involves curating and analyzing multimodal datasets to enhance model development and ensure ethical standards are met.
Responsibilities:
- Create high-quality datasets for training and evaluation; run experiments on new datasets (data ablations) to assess their impact and determine the most effective data
- Develop and maintain scalable data pipelines for multimodal ingestion, preprocessing, filtering, and annotation
- Analyze real-world multimodal datasets to assess quality, diversity, relevance, and identify areas for improvement
- Build lightweight tools and workflows for dataset auditing, visualization, and versioning
- Collaborate with Safety, Ethics, and Governance teams to ensure datasets meet standards for quality, privacy, and responsible AI practices
- Embody our culture and values