Flock is a leading safety technology platform focused on crime prevention and security. As a Staff Machine Learning Engineer specializing in Multimodal Modeling, you will enhance core embedding-based retrieval systems and improve model performance through the unification of text and image representations.

Responsibilities:

Lead the advancement of core embedding-based retrieval systems with a focus on scientific aspects of modeling
Fine-tune and extend multimodal models (e.g., CLIP, SigLIP) to improve performance, generalization, and cross-modal alignment
Work on unifying text and image representations and improving model performance
Ensure extensibility across evolving product use cases
Deliver fast, accurate, and scalable search experiences powered by state-of-the-art vision-language systems

Requirements:

7+ years of industry experience in Machine Learning with a focus on representation learning, multimodal modeling, or embedding-based retrieval
Deep domain knowledge in at least one area: computer vision, natural language processing, or recommendation systems
Strong proficiency in PyTorch, with experience fine-tuning foundation models and adapting pretrained vision-language models to real-world tasks
Demonstrated ability to customize and extend model architectures, training loops, loss functions, and data pipelines to deliver impact
Experience with embedding-based retrieval, including contrastive learning, multimodal alignment, and designing evaluation methods for vector similarity search and embedding quality
Solid engineering fundamentals in Python, with familiarity in Git, SQL, and Bash
Comfortable working independently and navigating ambiguity, with a track record of solving open-ended modeling problems
Ability to obtain and maintain Criminal Justice Information Services (CJIS) certification as a condition of employment
Familiarity with model compression techniques, such as distillation, quantization, and architecture pruning, to improve inference efficiency and deployability
Experience with vector search infrastructure, including provisioning, maintaining, and querying large-scale vector databases (e.g., FAISS, Weaviate, Pinecone)
Proficient with multi-GPU and distributed training workflows, to scale training of large multimodal models efficiently

Staff Machine Learning Engineer, Multimodal Modeling

Key skills

About this role

Responsibilities:

Requirements: