PythonSQLAIMachine LearningNLPNatural Language ProcessingLLMProject ManagementCommunication
About this role
Summary:
0-3 years of experience.
Must have a graduate degree in Linguistics
Must be native speaker of a non-English language (preferably Indonesian) with a high level of proficiency in another Austronesian language, plus broad knowledge of other languages in the same family.
Main duties:
Perform linguistic analyses on large datasets.
Perform linguistic error analysis of AI model outputs, determining what the most frequent and severe error categories are.
Write and revise guidelines for human annotation and other AI projects, including but not limited to translation tasks.
Conduct typological and sociolinguistic research on many languages, highlighting their similarities and differences.
Perform linguistic analyses for Responsible AI (toxic language, hate speech, gender bias and other cultural biases) in massively multilingual settings.
Conduct linguistic literature reviews on various NLP-adjacent topics and summarize findings.
Compare the quality of deliveries between vendors, identify error patterns, and provide actionable feedback.
Provide information or guidance relative to any aspect of linguistic knowledge (typology, morpho-syntax, sociolinguistics, classification, phonetics/phonology, pragmatics, etc.).
Reach out to and collaborate with native speakers in various languages.
Communicate results of linguistic analyses to engineers and research scientists.
Skills:
Must have strong written and spoken communication skills, especially business and research communication.
Must be native speaker of a non-English language (preferably Indonesian) with a high level of proficiency in another Austronesian language, plus broad knowledge of other languages in the same family.
Working knowledge in other languages is a plus. Proficiency in a low-resource language is valued.
Must be able to code in Python (must) and query databases using SQL, other coding languages used for data analysis are a plus.
Must be able to independently work through complex requests and perform under pressure.
Strong ability to work independently, prioritize, plan, and track work, as well as report progress
education or training in the basics of project management is a plus
self-motivation is a must
Working knowledge of international language-classification standards is valued.
Education:
Graduate degree in Linguistics or related field is a must; PhD is a plus
A background or specialization in corpus linguistics is a plus
Experience with field work is a plus
A graduate degree in Literature or English is not an appropriate substitution
Degree in Computer Science with a specialization in NLP is not an appropriate substitution
Must have a very firm grasp of the following linguistic fields: language typology, syntax, morphology, sociolinguistics (especially dialectology and discourse analysis), corpus linguistics, writing systems, pragmatics, phonology.
Must have some experience with applying basic Natural Language Processing techniques.
Experience:
Years of experience: 0-3
Experience working cross-functionally
Experience collaborating with machine learning, NLP, or software engineers, or data scientists
Experience contributing to research papers
Important: Preferably no known conflicts of interest in the fields of machine translation, ASR, TTS, or LLM research (as FAIR Linguists need to be contributing to research papers)
Top 3 Must-Have Hard Skills:
Must be native speaker of a non-English language (preferably Indonesian) with a high level of proficiency in another Austronesian language, plus broad knowledge of other languages in the same family.
Perform linguistic error analysis of machine translations and identifying the most frequent and severe error categories
Strong skills in pattern recognition, cross-functional communication, and multitasking
Experience with Python
Good to Have Skills:
Experience with creating and/or maintaining specialized lexical resources (e.g., profanity dictionaries) a plus
Soft Skills: Ability to independently work through ambiguous requests, based on priorities established by CWAM, and perform under pressure. Able to work cross functionally.
Typical Day in the Role:
Perform linguistic error analysis of machine translations, determining what the most frequent and severe error categories are.
Research literature on various NLP and AI topics (e.g., machine translation, ASR, TTS, LLM), summarize findings, and make suggestions to the team.
Compare quality of human translations between vendors, identify error patterns, and provide actionable feedback.
Maintain and revise guidelines for human translation and linguistic evaluation.
Conduct typological and sociolinguistic research on low-resource languages, highlighting their similarities and differences.
Help build integrity (profanity, hate-speech, bullying) dictionaries for various languages.
Reach out to and collaborate with native speakers in various languages.