Hapiko is a Brooklyn-based company building the future of play. They are seeking a Senior Data Scientist to optimize the ML pipeline for their product, Stickerbox, which transforms spoken ideas into printable stickers. The role involves model training, data quality evaluation, and enhancing ML systems for child-friendly applications.
Responsibilities:
- Build and curate large-scale image datasets for training custom models
- Design annotation pipelines and data quality processes
- Analyze training runs and model outputs to guide iteration
- Work with our team to define what to train on and how to evaluate it
- Optimize our transcription pipeline for accuracy and latency
- Improve image generation quality, prompt adherence, and consistency
- Identify bottlenecks and failure modes across the pipeline
- Run experiments and A/B tests to measure improvements
- Refine content safety systems for child-appropriate outputs, and develop new ones
- Build on our evaluation datasets for safety edge cases
- Analyze moderation performance and reduce false positives/negatives
- Stay current on best practices for AI safety in generative systems
- Build evaluation frameworks to measure model performance at scale
- Define metrics that correlate with user satisfaction (aesthetic quality, relevance, safety)
- Develop automated evaluation pipelines (LLM-as-judge, CLIP scores, human eval)
- Track experiments and communicate findings to the team
- Optimize prompts for transcription accuracy and image generation quality
- Develop systematic approaches to prompt testing and iteration
- Build prompt templates and guidelines for different use cases