Drive innovation in model compression and efficient deployment for advanced multimodal AI systems, including large language models (LLMs) and vision-language models (VLMs).
Reduce model footprint and computational cost while preserving accuracy, enabling high-performance AI to run efficiently across resource-constrained edge devices.
Apply and advance compression techniques such as quantization, knowledge distillation, and pruning.
Build robust compression pipelines, establish performance and fidelity metrics, and address bottlenecks in production inference.

A degree in Computer Science or related field.
Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences).
Experience with PyTorch deep learning frameworks or equivalent frameworks
Hands-on experience with model quantization including both Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ).
Research and hands-on experience with knowledge distillation for compressing large models into smaller, efficient ones.
Research and hands-on experience with model pruning for compressing large models into smaller, efficient ones.
Solid understanding of neural network architectures and training processes – Including transformers (e.g., LLMs, VLMs), backpropagation, optimization, and fine-tuning techniques.
Familiarity with C++ is a plus (especially for implementing low-level quantization kernels or inference optimizations).

AI Research Engineer – Model Compression, Quantization

Key skills