AI Research Engineer – Model Compression, Quantization
Italy
Full Time
2 weeks ago
No Sponsorship
Key skills
PyTorchC++CRAIMachine LearningDeep LearningNLPLarge Language Models
About this role
Role Overview
Drive innovation in model compression and efficient deployment for advanced multimodal AI systems, including large language models (LLMs) and vision-language models (VLMs).
Reduce model footprint and computational cost while preserving accuracy, enabling high-performance AI to run efficiently across resource-constrained edge devices.
Apply and advance compression techniques such as quantization, knowledge distillation, and pruning.
Build robust compression pipelines, establish performance and fidelity metrics, and address bottlenecks in production inference.
Requirements
A degree in Computer Science or related field.
Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences).
Experience with PyTorch deep learning frameworks or equivalent frameworks
Hands-on experience with model quantization including both Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ).
Research and hands-on experience with knowledge distillation for compressing large models into smaller, efficient ones.
Research and hands-on experience with model pruning for compressing large models into smaller, efficient ones.
Solid understanding of neural network architectures and training processes – Including transformers (e.g., LLMs, VLMs), backpropagation, optimization, and fine-tuning techniques.
Familiarity with C++ is a plus (especially for implementing low-level quantization kernels or inference optimizations).