Design, build, iterate on, and maintain prompts for production AI features including article summaries, content classification, brand safety classification, and new features as they launch.
Design evaluation criteria and workflows for AI features.
Define what "good" looks like for each use case, build evaluation datasets, run quality assessments, and track accuracy over time.
Design and execute workflows for human feedback on AI outputs.
Partner closely with engineers on your team who own the technical implementation of AI features.
Requirements
2–4 years of experience in content operations, content classification, editorial workflows, taxonomy management, or a related field
Demonstrated experience with prompt engineering for large language models in a professional or academic setting
Experience designing evaluation criteria or quality assessment workflows for content or AI outputs
Strong understanding of how LLMs work — including their capabilities, limitations, stochastic behavior, and cost implications
Excellent written and verbal communication skills, with the ability to explain technical AI concepts to non-technical audiences
Detail-oriented with strong analytical skills and comfort working with evaluation data and quality metrics
A collaborative mindset and comfort working closely with software engineers and editorial stakeholders
Comfort navigating ambiguity in a fast-evolving space — applied AI best practices are still being established