Collaborate with senior engineers and mentors to support the development and evaluation of AI-powered educational tools
Contribute to research-backed initiatives focused on improving AI system behavior, including identifying and measuring issues such as model bias and misalignment (e.g., sycophancy)
Assist in designing and executing experiments to evaluate AI performance in real-world learning scenarios
Translate evaluation findings into actionable insights that improve AI system quality and reliability
Support the mapping of AI evaluation work to established governance or responsible AI frameworks
Participate in code reviews, team discussions, and knowledge-sharing sessions
Take ownership of defined tasks while demonstrating initiative and creative problem-solving
Requirements
Currently in your final year of a Bachelor’s or Master’s program (graduating soon) or recently graduated in Computer Science, Engineering, or a related field
Strong curiosity about AI, machine learning, and their real-world applications—especially in education
Demonstrated ingenuity and problem-solving ability, with a willingness to explore ambiguous challenges
Ability to take direction and independently translate guidance into meaningful outputs
Foundational knowledge of programming (e.g., Python) and familiarity with machine learning concepts
Interest in AI safety, responsible AI, or evaluation methodologies is a strong plus
Excellent communication and collaboration skills
Self-motivated, proactive, and eager to learn in a fast-paced environment
Tech Stack
Python
Benefits
Colibri Group welcomes applicants from all backgrounds and experiences
Commitment to building a diverse and inclusive workplace
Encouragement to apply even if background doesn't align perfectly with every qualification