Role Overview

Design and build machine learning, NLP, and generative AI systems for scientific discovery, knowledge extraction, decision support, and intelligent content understanding.
Work with large-scale, complex, and heterogeneous data, including scientific publications, research datasets, knowledge graphs, ontologies, taxonomies, citations, metadata, and content from every scientific discipline.
Apply the right technique to each problem, using approaches such as classification, regression, clustering, ranking, feature engineering, deep learning, embeddings, LLMs, retrieval, and generative AI.
Develop capabilities for semantic search, information retrieval, entity extraction, content classification, recommendation, ranking, summarization, question answering, and evidence-grounded generation.
Build, evaluate, fine-tune, prompt, and integrate models into robust production systems, while continuously improving quality, relevance, reliability, and user value.
Write clean, tested, production-quality Python and contribute reusable data science components, packages, and scalable data pipelines for preprocessing, inference, experimentation, monitoring, and continuous improvement.
Support deployment, monitoring, model maintenance, drift detection, automated retraining, and ongoing optimization of data science systems.
Collaborate with engineering, product, UX, analytics, research, and domain experts, and communicate technical concepts, model behavior, insights, trade-offs, and recommendations clearly to technical and non-technical audiences.

Requirements

Experience in data science, machine learning, artificial intelligence, NLP, statistics, applied mathematics, computer science, or a related quantitative area.
Experience working with frontier LLMs such as OpenAI’s GPTs, Anthropic’s Claude, and Google’s Gemini, including fine-tuning LLMs and/or SLMs.
Strong Python skills and a habit of writing clean, maintainable, well-tested code.
A solid grasp of machine learning fundamentals, including supervised and unsupervised learning, feature engineering, model evaluation, model selection, and performance measurement.
Experience working with structured, semi-structured, or unstructured data, especially large-scale text or content datasets.
Familiarity with common data science and machine learning tools such as Pandas, NumPy, SciPy, Scikit-learn, PyTorch, TensorFlow, or Matplotlib.
The ability to translate complex and ambiguous requirements into practical, measurable, data-driven solutions, with strong analytical thinking, problem-solving skills, and attention to quality.
Clear communication skills, a collaborative approach to working with engineering, product, and business stakeholders, and a genuine interest in building production-ready systems that deliver real user value.

Tech Stack

Numpy
Pandas
Python
PyTorch
Scikit-Learn
Tensorflow

Benefits

We promote a healthy work/life balance across the organisation.
Numerous wellbeing initiatives.
Shared parental leave.
Study assistance.
Sabbaticals.

Data Scientist II

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits