Trase Systems is an innovative AI company that empowers enterprises to utilize AI effectively while minimizing complexity and risks. As a Principal Applied ML Researcher, you will lead the ML and LLM strategy, focusing on integrating machine learning systems into production environments and ensuring their reliability and efficiency.
Responsibilities:
- Drive technical breakthroughs in agentic systems, applied ML infrastructure, and LLM-based applications
- Define and evolve the ML/LLM strategy and technology roadmap in alignment with product development
- Act as a principal technical authority, making high-impact architectural and modeling decisions across teams
- Develop prototypes for key technologies to validate new approaches and de-risk system design
- Own the full lifecycle from research and experimentation through production deployment, monitoring, and iteration
- Translate advances in ML into scalable, production-grade systems with measurable impact
- Design how LLMs operate within agent workflows, tool use, and multi-step reasoning and long-lived execution
- Implement and refine prompting strategies, multi-agent orchestration, memory management, and human-in-the-loop controls for safety and reliability
- Establish patterns for planning, decision-making, and tool orchestration within complex systems
- Own end-to-end quality evaluation of ML-powered systems, including defining metrics, benchmarks, and testing frameworks
- Establish evaluation systems that connect model performance to task success and system-level outcomes
- Ensure systems behave predictably, safely, and reliably in production through monitoring, regression testing, and robust failure handling
- Contribute to the design of ML systems supporting the full lifecycle, including training, fine-tuning, evaluation, deployment, and monitoring
- Drive architecture decisions across model serving, routing, orchestration, and latency and cost optimization
- Work across infrastructure layers, including cloud and containerized systems, to ensure scalable and efficient deployment
- Build and deploy enterprise-grade AI systems used by global customers in production environments
- Design systems that operate reliably in regulated and constrained settings, including on-premise, air-gapped, and secure cloud environments
- Ensure systems are auditable, explainable, and compliant with regulatory and organizational requirements
- Write technical reports and design documents summarizing R&D progress, system behavior, and key decisions
- Communicate complex ML concepts and tradeoffs clearly to both technical and non-technical stakeholders
- Drive alignment across research, engineering, and product through strong technical leadership
- Mentor junior and senior engineers and researchers, raising the bar for ML rigor and system-level thinking
- Establish and propagate best practices for ML system design, evaluation, and reliability across the organization
- Influence technical direction beyond immediate teams through high-impact, cross-functional work