PandasPythonAIMLNLPNatural Language ProcessingGenerative AIGenAIHugging FaceData EngineeringStatistical AnalysisCollaboration
About this role
Role Overview
You can lead long-term projects with high complexity and ambiguity from first discussion with the client to completion
Design/improve workflows to create data for AI/ML training and evaluation. Includes human annotation and data-collection workflows, as well as synthetic ones
Dive deep into existing workflows and processes to gather data and insights, make recommendations, and drive improvement through innovation and cross-functional collaboration with customers
Critically assess annotation tooling and workflows
Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance
Work closely with client stakeholders on understanding goals, gathering requirements, proposing solutions, and executing them.
Set an ambitious research agenda for improving our products and services
Contribute to establishing best practices and standards for generative AI development with customers and within the organization
Requirements
MA in (computational) linguistics, data science, computer science (AI / ML / NLU), quantitative social sciences or a related scientific / quantitative field, PhD strongly preferred
Ability to collaborate directly with technical stakeholders including senior project managers, data engineers, and research scientists.
Collaborating with cross-functional teams to define AI project requirements and objectives, ensuring alignment with overall business goals
Design efficient data strategies for complex long-term projects, potentially involving multiple teams and workflows.
Knowledge of how components of GenAI products or services combine to work
Developing clear and concise documentation, including technical specifications, user guides, and presentations, to communicate complex AI concepts to both technical and nontechnical stakeholders
Familiarity with GenAI technologies that enables you to improve existing processes to handle future challenges.
Extensive experience working with human language data and designing human evaluation tasks, including multi-phase and complex workflows.
Deep understanding of language and its relationship with culture
Ability to identify ambiguity and subjectivity in language
Ability to work with multi-lingual and multi-modal projects
Advanced knowledge of statistics, metrics (e.g. f1 score, inter-rater reliability metrics), and data analysis methods such as sampling.
Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face.
Proficiency in Python to handle / transform large datasets (e.g. pre
and postprocessing data, pandas) perform quantitative analyses visualize data (for example matplotlib, seaborn)
Deep understanding of data pipelines to support ML and NLP workflows,
Knowledge of efficient data collection, transformation, and storage
Knowledge of data structures, algorithms, and data engineering principles
Excellent interpersonal skills for effective cross-functional stakeholder engagement
Excellent problem-solving skills, with the ability to think critically and creatively to develop innovative AI solutions
Ability to work independently and collaborate as part of a team
Adaptable to changing technologies and methodologies
Ability to translate experience, research and development information to understand client products and services.
Tech Stack
Pandas
Python
Benefits
Providing technical mentorship and guidance to junior team members