Smartsheet has been a leader in work management for over 20 years, and they are seeking a Senior Software Engineer II to enhance their AI-powered work management platform. The role involves owning agent quality, diagnosing failures, and driving improvements in AI systems, particularly in the context of LLM evaluation and prompt engineering.

Responsibilities:

Own agent quality end-to-end: diagnosis, improvement, and validation across SmartAssist's orchestrator and subagents
Identify failure modes across quality dimensions factual accuracy, completeness, tone, actionability, and latency and prioritize what to fix
Drive quality improvements through prompt engineering, context engineering, and RAG retrieval tuning
Extend and mature our evaluation framework: scorers, golden datasets, regression gates, and online evaluation for production traffic
Close the feedback loop ensure that every change has a measurable, attributable quality signal
Collaborate with our Agent Architecture lead to distinguish quality problems that require prompt/context solutions from those that require structural fixes
Establish repeatable methodology that scales beyond any single agent or subagent

Requirements:

8+ years of software engineering experience, with at least 2 years working directly with LLMs in production
Deep, hands-on experience with prompt engineering and context engineering, you understand how model behavior changes with framing, structure, and input design
Strong working knowledge of RAG architectures: chunking strategies, embedding models, retrieval evaluation, and failure diagnosis
Experience building or extending LLM evaluation frameworks, you have designed scorers, worked with golden datasets, and thought carefully about what good looks like
Fluency in agent system design, you don't need to own the architecture, but you can engage as a peer on architectural tradeoffs that affect quality
Strong Python skills; comfortable working in data-heavy environments (Databricks, Delta tables, or equivalent)
Ability to communicate complex quality findings (written and verbal) to both technical and non-technical stakeholders, you can explain what's broke, why it matters, and what needs to happen next without losing the room
Strong cross-functional judgment, you know when to escalate, when to resolve independently, and how to build credibility across engineering, product, and AI platform teams
A bias for clarity in ambiguous situations, when failure modes are murky and trade-offs are real, you bring structure and a clear point of view rather than waiting for consensus
Legally eligible to work in the U.S. on an ongoing basis
BS or MS in Computer Science, a related field, or equivalent industry experience
Experience with MLflow or similar experiment tracking platforms
Familiarity with CI-integrated evaluation pipelines
Experience with multi-agent orchestration frameworks
Prior work in an Applied AI or LLMOps function within a product company

Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible)

Key skills

About this role

Responsibilities:

Requirements: