Virio is building the next-generation B2B GTM stack. As a Harness Engineer, you'll own the intelligence layer that sits between AI models and product, focusing on system prompts, tool definitions, and evaluation frameworks to enhance agent performance in production.
Responsibilities:
- Build and refine the system prompts, tool integrations, and context windows that shape how our models behave across our platform—you'll see how small prompt tweaks cascade into measurable product impact. (Nice to have: background in English, literature, creative writing, or humanities)
- Design the evaluation framework (LLM-as-judge evals, deterministic tests, production monitoring) that lets us ship with confidence. You'll define what success looks like and build the metrics to measure it
- Architect the abstraction layer between our product (file systems, artifacts, skills) and the model's capabilities—making complex multi-step workflows feel natural to the model and reliable to users
- Own prompt versioning, experimentation, and iteration. You'll A/B test prompt variations, measure their impact on agent quality, and ship improvements across our production system
- Collaborate with product and engineering to identify signal—where our agents are failing, where users are stuck—and translate that into prompt and architecture improvements