Themis Intelligence is redefining how utilities operate through their Utility Knowledge Base and Human-Guided Intelligence platforms. They are seeking a co-op student to work on the post training lifecycle of Themis Agents, focusing on evaluation frameworks and model behavior analysis in AI applications.
Responsibilities:
- Evaluate Themis Agents for accuracy, factual consistency, hallucination, and tool correctness
- Analyze grounding failures—when models 'go off-script' from retrieved knowledge or internal documents
- Score and compare outputs across tasks like Q&A, summarization, and event reasoning
- Experiment with prompt templates, few-shot examples, and retrieval settings
- Compare vector store search performance using embedding models, chunking strategies, and context window variations
- Run A/B tests across model versions and prompt chains
- Build or extend evaluation pipelines in Python and frameworks like LangChain, OpenAI API, or Transformers
- Visualize and organize test results using tools like Streamlit, pandas, or Dash
- Help define 'hallucination types' and build reproducible test suites for failure tracking