Act as the technical owner for enterprise customer VLM post-training engagements.
Translate customer requirements into concrete multimodal post-training specifications and workflows.
Design and execute visual data generation, filtering, and quality assessment processes, including image-text pair curation, annotation pipelines, and synthetic data generation for visual tasks.
Run supervised fine-tuning, preference alignment, and reinforcement learning workflows for vision-language models.
Design task-specific evaluations for visual understanding, grounding, OCR, document parsing, and other multimodal capabilities. Interpret results and feed learnings back into core post-training pipelines.

Hands-on experience with data generation and evaluation for VLM or multimodal post-training.
Experience training or fine-tuning vision-language models using SFT, preference alignment, and/or RL.
Strong intuition for visual data quality, annotation design, and multimodal evaluation.
Familiarity with vision encoders, image-text architectures, and how visual representations interact with language model backbones.
Experience with visual grounding, document understanding, OCR, or video understanding tasks is nice-to-have.
Experience contributing to shared or general-purpose multimodal post-training infrastructure is nice-to-have.
Prior exposure to customer-facing or applied ML delivery environments is nice-to-have.
Familiarity with alignment or RL techniques beyond basic supervised fine-tuning in the multimodal setting is nice-to-have.

Competitive base salary with equity in a unicorn-stage company.
Health: We pay 100% of medical, dental, and vision premiums for employees and dependents.
Financial: 401(k) matching up to 4% of base pay.
Time Off: Unlimited PTO plus company-wide Refill Days throughout the year.

Member of Technical Staff – Post Training, Applied Vision

Key skills