Evolent is a company that partners with health plans and providers to improve healthcare outcomes for complex health conditions. They are seeking an ML/LLM Operations Engineer to ensure AI systems deliver reliable and compliant results in healthcare, working closely with Data Science, Engineering, and other cross-functional teams.
Responsibilities:
- Develop and maintain standardized evaluation frameworks to consistently measure LLM performance across relevant healthcare metrics
- Build monitoring systems using Logfire to track AI model performance, detect drift, and alert the team to anomalies
- Create testing infrastructure for prompt versions, model selection, and quality assurance processes
- Design and implement audit sampling processes for continuous quality monitoring and clinical review workflows
- Oversee regulatory compliance processes, including documentation for bias assessments, model cards, and audit trails required in healthcare
- Optimize LLM operations through intelligent model selection, prompt engineering, and cost management strategies
- Support the transition from successful POCs to production-ready services with appropriate testing and validation
- Partner with DevOps on infrastructure requirements while focusing on AI-specific monitoring and optimization
- Create and maintain documentation, runbooks, and operational procedures for all deployed AI systems
- Collaborate with Clinical Support Liaison to incorporate clinical feedback into system improvements
- Prepare regular reports on AI system quality, performance metrics, and compliance status
Requirements:
- Bachelor's or master's degree in computer science, data science, or related field
- 2+ years of experience with Python development and at least one production LLM implementation
- Strong proficiency in SQL for complex log analysis and metrics generation
- Demonstrated experience with LLM APIs and frameworks (experience with PydanticAI, LangChain, or similar)
- Experience with monitoring tools and practices for AI systems, including performance metrics, drift detection, and alerting
- Understanding of LLM behavior, prompt engineering, and common failure modes in production
- Experience building evaluation or testing frameworks for AI/ML systems
- Strong communication skills for cross-functional collaboration
- Experience with healthcare AI applications and compliance requirements
- Familiarity with multiple LLM providers (OpenAI, Anthropic, Google, Azure)
- Knowledge of Pydantic ecosystem including PydanticAI and Logfire
- Understanding of LLM evaluation metrics and methodologies
- Experience building tools for non-technical users
- Basic knowledge of containerization (Docker) for local testing and development
- Experience with cloud environments (AWS, Azure) as a user
- Understanding of API rate limiting, quota management, and cost optimization strategies
- Knowledge of CI/CD concepts for ML model deployments
- Experience with regulatory compliance and audit processes
- Excellent documentation skills and attention to detail