Senior Language Data Scientist – Search Specialization
United States
Full Time
18 hours ago
No H1B
Key skills
PandasPythonAIMLNLPNatural Language ProcessingGenerative AIHugging FaceData EngineeringStatistical AnalysisA/B TestingCollaboration
About this role
Role Overview
Lead long-term projects with high complexity and ambiguity from first discussion with the client to completion
Design/improve workflows to create data for AI/ML training and evaluation.
Design and refine search data annotation frameworks, including relevance judging guidelines
Dive deep into existing workflows and processes to gather data and insights, make recommendations, and drive improvement through innovation and cross-functional collaboration with customers
Assess and optimize search-specific evaluation approaches, including A/B testing frameworks, ranking metrics, and human evaluation studies for search result quality
Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance.
Work closely with client stakeholders on understanding goals, gathering requirements, proposing solutions, and executing them.
Contribute to establishing best practices and standards for generative AI development with customers and within the organization
Requirements
MA in (computational) linguistics, data science, computer science (AI / ML / NLU), quantitative social sciences or a related scientific / quantitative field, PhD strongly preferred
Ability to collaborate directly with technical stakeholders including senior project managers, data engineers, and research scientists.
Extensive experience working with search-specific language data (queries, documents, relevance judgments, intent labels) and designing human evaluation tasks, including multi-phase and complex workflows.
Advanced knowledge of statistics, metrics (e.g. f1 score, inter-rater reliability metrics), and data analysis methods such as sampling.
Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face.
Proficiency in Python to handle / transform large datasets (e.g. pre
and postprocessing data, pandas) perform quantitative analyses visualize data (for example matplotlib, seaborn)
Deep understanding of data pipelines to support ML and NLP workflows.
Knowledge of efficient data collection, transformation, and storage.
Knowledge of data structures, algorithms, and data engineering principles.