About this role

Themis Intelligence is redefining how utilities operate through their Utility Knowledge Base and Human-Guided Intelligence platforms. They are seeking a co-op student to work on the post training lifecycle of Themis Agents, focusing on evaluation frameworks and model behavior analysis in AI applications.

Responsibilities:

Evaluate Themis Agents for accuracy, factual consistency, hallucination, and tool correctness
Analyze grounding failures—when models 'go off-script' from retrieved knowledge or internal documents
Score and compare outputs across tasks like Q&A, summarization, and event reasoning
Experiment with prompt templates, few-shot examples, and retrieval settings
Compare vector store search performance using embedding models, chunking strategies, and context window variations
Run A/B tests across model versions and prompt chains
Build or extend evaluation pipelines in Python and frameworks like LangChain, OpenAI API, or Transformers
Visualize and organize test results using tools like Streamlit, pandas, or Dash
Help define 'hallucination types' and build reproducible test suites for failure tracking

Research Engineer- Post Training (Co-op/Intern)

Key skills

About this role

Responsibilities: