Flock is a leading safety technology platform focused on crime prevention and security. As a Staff Machine Learning Engineer specializing in Multimodal Modeling, you will enhance core embedding-based retrieval systems and improve model performance through the unification of text and image representations.
Responsibilities:
- Lead the advancement of core embedding-based retrieval systems with a focus on scientific aspects of modeling
- Fine-tune and extend multimodal models (e.g., CLIP, SigLIP) to improve performance, generalization, and cross-modal alignment
- Work on unifying text and image representations and improving model performance
- Ensure extensibility across evolving product use cases
- Deliver fast, accurate, and scalable search experiences powered by state-of-the-art vision-language systems
Requirements:
- 7+ years of industry experience in Machine Learning with a focus on representation learning, multimodal modeling, or embedding-based retrieval
- Deep domain knowledge in at least one area: computer vision, natural language processing, or recommendation systems
- Strong proficiency in PyTorch, with experience fine-tuning foundation models and adapting pretrained vision-language models to real-world tasks
- Demonstrated ability to customize and extend model architectures, training loops, loss functions, and data pipelines to deliver impact
- Experience with embedding-based retrieval, including contrastive learning, multimodal alignment, and designing evaluation methods for vector similarity search and embedding quality
- Solid engineering fundamentals in Python, with familiarity in Git, SQL, and Bash
- Comfortable working independently and navigating ambiguity, with a track record of solving open-ended modeling problems
- Ability to obtain and maintain Criminal Justice Information Services (CJIS) certification as a condition of employment
- Familiarity with model compression techniques, such as distillation, quantization, and architecture pruning, to improve inference efficiency and deployability
- Experience with vector search infrastructure, including provisioning, maintaining, and querying large-scale vector databases (e.g., FAISS, Weaviate, Pinecone)
- Proficient with multi-GPU and distributed training workflows, to scale training of large multimodal models efficiently