Harnham is a well-funded AI research company focused on building next-generation multimodal models for media and interactive experiences. This role is a high-impact, research-meets-engineering position centered on the data that powers advanced AI systems, where you'll design datasets, run experiments, and build data pipelines to enhance model capabilities.
Responsibilities:
- Design multimodal, multitask datasets to unlock new model capabilities
- Run controlled experiments to understand how data impacts model performance
- Build and scale pipelines for synthetic data generation, filtering, and quality control
- Define evaluation frameworks and benchmarks to measure real-world model improvement
- Partner with cross-functional teams to translate product goals into data strategies
Requirements:
- 4+ years of experience in machine learning, ideally with a data-centric focus
- Experience working with large multimodal datasets and generative models
- Strong intuition for how data quality and composition impact model behavior
- Experience across the full ML lifecycle, from data to training to evaluation
- Proficiency with ML frameworks such as PyTorch or JAX
- Experience with distributed systems or compute tools (e.g., Ray, Kubernetes)
- Strong interest in advancing next-generation AI systems
- Experience with synthetic data generation or data curation at scale
- Background working on multimodal or video-based models
- Exposure to evaluation and benchmarking for generative systems