Trase Systems is an innovative AI company that empowers enterprises to utilize AI effectively while minimizing complexity and risks. As a Principal Applied ML Researcher, you will lead the ML and LLM strategy, focusing on integrating machine learning systems into production environments and ensuring their reliability and efficiency.

Responsibilities:

Drive technical breakthroughs in agentic systems, applied ML infrastructure, and LLM-based applications
Define and evolve the ML/LLM strategy and technology roadmap in alignment with product development
Act as a principal technical authority, making high-impact architectural and modeling decisions across teams
Develop prototypes for key technologies to validate new approaches and de-risk system design
Own the full lifecycle from research and experimentation through production deployment, monitoring, and iteration
Translate advances in ML into scalable, production-grade systems with measurable impact
Design how LLMs operate within agent workflows, tool use, and multi-step reasoning and long-lived execution
Implement and refine prompting strategies, multi-agent orchestration, memory management, and human-in-the-loop controls for safety and reliability
Establish patterns for planning, decision-making, and tool orchestration within complex systems
Own end-to-end quality evaluation of ML-powered systems, including defining metrics, benchmarks, and testing frameworks
Establish evaluation systems that connect model performance to task success and system-level outcomes
Ensure systems behave predictably, safely, and reliably in production through monitoring, regression testing, and robust failure handling
Contribute to the design of ML systems supporting the full lifecycle, including training, fine-tuning, evaluation, deployment, and monitoring
Drive architecture decisions across model serving, routing, orchestration, and latency and cost optimization
Work across infrastructure layers, including cloud and containerized systems, to ensure scalable and efficient deployment
Build and deploy enterprise-grade AI systems used by global customers in production environments
Design systems that operate reliably in regulated and constrained settings, including on-premise, air-gapped, and secure cloud environments
Ensure systems are auditable, explainable, and compliant with regulatory and organizational requirements
Write technical reports and design documents summarizing R&D progress, system behavior, and key decisions
Communicate complex ML concepts and tradeoffs clearly to both technical and non-technical stakeholders
Drive alignment across research, engineering, and product through strong technical leadership
Mentor junior and senior engineers and researchers, raising the bar for ML rigor and system-level thinking
Establish and propagate best practices for ML system design, evaluation, and reliability across the organization
Influence technical direction beyond immediate teams through high-impact, cross-functional work

Principal Applied ML Researcher (Agentic Systems & Applied AI Platform)

Key skills

About this role

Responsibilities: