Participate in defining and validating data-driven business hypotheses and experiments;
Design and execute advanced analyses, metrics, and evaluation strategies for agents (e.g., golden set, regression tests, failure analysis);
Build and maintain data pipelines and training/test datasets with a focus on reproducibility and quality;
Develop and fine-tune ML models when necessary (baseline → production), and integrate with generative AI solutions;
Support the design and improvement of RAG (Retrieval-Augmented Generation): chunking strategy, metadata, deduplication, ranking signals, and knowledge base updates;
Implement and maintain observability: structured logs, quality metrics, dashboards, and alerts;
Work closely with engineering and product teams to define architecture, requirements, trade-offs, and the roadmap;
Review PRs, share knowledge, and mentor less experienced colleagues.

Experience in Natural Language Processing (NLP): tokenization, stemming/lemmatization, stop words, and n-grams; text representations such as TF-IDF, Bag-of-Words, and dense embeddings.
Experience in Information Retrieval (IR): inverted indices, classic ranking models (BM25, TF-IDF), and search system evaluation using metrics like Precision, Recall, F1, MRR, and NDCG.
Experience with semantic search;
Hands-on experience with Python for prototyping and experimentation (Jupyter Notebooks or similar).
Experience with modern NLP libraries such as Hugging Face Transformers, sentence-transformers, spaCy, NLTK, or equivalents.
Experience training, fine-tuning, or adapting models for tasks such as semantic similarity, text classification, and search/information retrieval.
Knowledge of the Machine Learning model lifecycle: dataset creation and preparation (collection, cleaning, and curation); comparative evaluation between model versions; and experiment versioning and reproducibility.
Ability to collaborate with engineering teams to define inference APIs and translate product needs into model-based solutions.
Advanced English (speaking, reading, and writing).

40-hour workweek under CLT employment, flexible hours with a hybrid setup (4 days in the office and 1 day remote)
Gympass (WellHub, workplace exercise, quick massage, and psychological support)
Medical and dental plans for you and your family
Childcare assistance for miniSiDiers, 120-day maternity leave and extended paternity leave
Company contributions to a private pension plan
Educational incentives for continued studies and specialization, language fluency support, and weekly lecture series on global trend topics
Flexible Meal and Food Vouchers
Transportation subsidy and parking for employees working at the office
Annual performance bonus and awards for SiDiers who achieve outstanding results
Committees focused on Well-being, Diversity, Mental Health, Social initiatives, Sustainability, and Women in Technology inclusion
Relaxed, collaborative workspaces with common areas, a decompression room, kitchen, and coffee machine
Dozens of partnerships offering additional benefits and discounts!

Data Scientist – Mid-level

Key skills