Snorkel AI is a data development company spun out of the Stanford AI Lab, focused on improving the truthfulness and reasoning abilities of advanced AI systems. They are seeking an Expert Contributor - DevOps Engineer to complete real-world, end-to-end tasks in terminal environments and help shape the next generation of AI models. The role involves designing complex tasks, evaluating AI coding agents, and providing expert feedback to guide model behavior.
Responsibilities:
- Design complex, verifiable terminal tasks spanning Python scripting, automation, data pipelines, software engineering, system administration, and infrastructure — each with a clear, reproducible solution and automated verification criteria
- Challenge a coding agent with difficult, targeted prompts designed to expose weaknesses and push the boundaries of an existing codebase or repository
- Provide multi-turn preference ratings and feedback as you guide the model iteratively toward an ideal solution — evaluating correctness, approach, and engineering judgment at each step
- Develop detailed prompts that present realistic, end-to-end engineering scenarios, paired with structured checklists of specific criteria to evaluate correctness, efficiency, and approach
- Write grading rubrics and scoring guides modeled on real task completion standards — accounting for edge cases, failure modes, and alternative valid approaches
Requirements:
- Demonstrated ability to complete hard, end-to-end terminal tasks autonomously — spanning compiling code, training models, configuring servers, system administration, security tasks, data science workflows, and debugging systems
- Deep fluency in Python and Linux/Unix environments, shell scripting, containerization (Docker/Kubernetes), and package/environment management
- Strong code review instincts — able to identify suboptimal solutions, suggest improvements, and evaluate competing approaches across multiple turns of a conversation
- Ability to think through multi-step terminal workflows from first principles and clearly articulate the reasoning, tradeoffs, and edge cases involved
- Background in one or more of: DevOps, SRE, Platform Engineering, Backend/Systems Engineering, or Security/Penetration Testing
- familiarity with agent evaluation frameworks, CI/CD integration, or long-horizon planning in automated pipelines