About this role

Harnham is a well-funded AI research company focused on building next-generation multimodal models for media and interactive experiences. This role is a high-impact, research-meets-engineering position centered on the data that powers advanced AI systems, where you'll design datasets, run experiments, and build data pipelines to enhance model capabilities.

Responsibilities:

Design multimodal, multitask datasets to unlock new model capabilities
Run controlled experiments to understand how data impacts model performance
Build and scale pipelines for synthetic data generation, filtering, and quality control
Define evaluation frameworks and benchmarks to measure real-world model improvement
Partner with cross-functional teams to translate product goals into data strategies

Requirements:

4+ years of experience in machine learning, ideally with a data-centric focus
Experience working with large multimodal datasets and generative models
Strong intuition for how data quality and composition impact model behavior
Experience across the full ML lifecycle, from data to training to evaluation
Proficiency with ML frameworks such as PyTorch or JAX
Experience with distributed systems or compute tools (e.g., Ray, Kubernetes)
Strong interest in advancing next-generation AI systems
Experience with synthetic data generation or data curation at scale
Background working on multimodal or video-based models
Exposure to evaluation and benchmarking for generative systems

Research Engineer, Data

Key skills

About this role

Responsibilities:

Requirements: