Design/improve workflows to create data for AI/ML training and evaluation
Dive deep into existing workflows and processes to gather data and insights
Critically assess annotation tooling and workflows
Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance
Work closely with client stakeholders on understanding goals, gathering requirements, proposing solutions and executing them.

MA in (computational) linguistics, data science, computer science (AI / ML / NLU), quantitative social sciences, or a related scientific/quantitative field, PhD strongly preferred
Familiarity with language use in online spaces, in particular language trends and innovations
Extensive experience working with human language data and designing human evaluation tasks, including multi-phase and complex workflows
Advanced knowledge of statistics, metrics (e.g. f1 score, inter-rater reliability metrics), and data analysis methods such as sampling
Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face
Proficiency in Python to handle / transform large datasets, perform quantitative analyses, visualize data

Language Data Scientist

Key skills