Kake is a leading expert human data platform for AI agents and LLMs, seeking a Senior Software Engineer to contribute to the development and evaluation of AI training data. In this unique role, you will leverage your technical expertise to write prompts, produce reference-quality code solutions, and evaluate AI-generated outputs to enhance AI systems.

Responsibilities:

Create and review coding tasks based on real-world software engineering scenarios, including debugging, refactoring, code generation, API usage, automated tests, performance, security, and edge cases
Write high-quality reference solutions that are correct, clear, testable, and aligned with task requirements
Evaluate AI-generated code and responses using structured rubrics, assessing correctness, clarity, security, performance, maintainability, and instruction-following
Compare multiple model responses, select the strongest answer, and justify your decision with clear technical reasoning
Identify bugs, hallucinated APIs, missing edge cases, weak explanations, and poor engineering decisions in AI-generated outputs
Work with terminal-based development workflows when needed, including running tests, debugging issues, managing dependencies, and navigating repositories
Follow detailed guidelines consistently and participate in calibration activities to ensure high-quality, reliable evaluations

Requirements:

5+ years of professional software engineering experience in a backend, fullstack, or systems role
Strong proficiency in at least one core programming language, ideally Python, JavaScript/TypeScript, Go, Java, C++, or SQL
Hands-on experience with Terminal-Bench, with the ability to evaluate AI agent performance on terminal-based tasks including compiling code, running tests, managing environments, and completing multi-step software engineering workflows
Comfortable working with Git, command line/terminal, and common development workflows
Ability to evaluate code critically - not only whether it works, but whether it is well-designed, secure, and maintainable
Prior experience in AI data production, RLHF, data annotation, or LLM evaluation projects
Excellent written and verbal communication skills in English
Ability to work independently in a remote, asynchronous, fast-paced environment
High attention to detail and the ability to follow complex, rubric-based guidelines consistently
Experience with Python-heavy workflows, automated testing frameworks, Docker, Linux, bash, or containerized environments
Experience with repo-level code reasoning, large codebases, or open-source contributions
Background in backend systems, data engineering, DevOps, infrastructure, security, or large codebase

Senior Software Engineer & LLM Code Trainer

Key skills

About this role

Responsibilities:

Requirements: