Welocalize is seeking a Prompt Engineer who will be responsible for the end-to-end technical migration workflow for transitioning templates to LLM autoraters. The role involves using internal tools to leverage prompt engineering techniques to maximize model performance and ensure high-quality outputs.
Responsibilities:
- Utilize Automatic Prompt Generation (APG) tools to create baseline prompts for complex parent-child template clusters
- Run and supervise Automated Prompt Optimization (APO) tool, review the outputs, and flag when the APO reaches deadlocks or plateaus
- Manually draft, test, and refine prompts to navigate complex template architectures, overcome anti-patterns, and handle edge cases where tooling is lacking or broken. Solve edge-case scenarios by designing and refining manual prompts
- Monitor shadowbot runs to ensure sufficient disagreements (between human and LLM ratings) are registered, generated, and tracked
- Run prompt versions against established gold data to continuously measure autorater quality against the human crowd baseline, calculating accuracy metrics such as F1 scores, precision, and recall
- Draft technical launch readiness justifications (Launch Certification Documentation) for final
Requirements:
- Native fluency in English
- Must be based in United States
- Bachelor's, Master's, or Doctorate degree in Computer Science, Data Science, Computational Linguistics, Human-Computer Interaction (HCI), Cognitive Science, or a related analytical field
- At least 4 years' experience as Prompt Engineer. Proven experience tuning Large Language Models (LLMs) for strict, structured outputs, complex classification tasks, and familiarity with chain-of-thought and few-shot learning
- Strong proficiency in identifying error patterns, analyzing model performance, and using SQL or other data analytics tools
- Ability to quickly learn and master proprietary tools with minimal supervision
- Excellent verbal and written communication skills
- Familiarity with enterprise-grade LLM interfaces like the Goose API
- Experience in AI model evaluation, data science, computational linguistics, or software engineering
- Hands-on experience with Automated Prompt Optimization (APO) systems or tuning workflows
- Linguistic expertise, including an understanding of semantics and logic