TraceLink is the world’s largest Agentic Business Network, enabling life sciences and healthcare companies to build and manage a scalable digital workforce of governed, no-code AI agents. The role involves leading the design and deployment of production-grade GenAI and ML systems, with a focus on hands-on development of multi-agent architectures in cloud environments.
Responsibilities:
- Hands-on ownership of building and shipping multi-agent systems (planner/executor, tool-using agents, supervisor patterns, routing, role-based agents) from prototype to production
- Write production-quality code for agent orchestration, tool integration, memory/state design, and context management
- Lead context engineering strategies for multi-agent coordination: prompt design, state persistence, agent handoffs, grounding, constraints, and safety controls
- Hands-on fine-tune and deploy SLM models for production usage: dataset creation, training workflows, evaluation, and inference serving
- Build Advanced RAG pipelines end-to-end, including semantic search, embeddings, hybrid retrieval, and cross-encoder reranking
- Implement evaluation frameworks for multi-agent systems covering quality, latency, cost, robustness, and failure mode detection
- Collaborate with platform and product engineering to ensure solutions are cloud-native, secure, observable, and scalable (monitoring, logging, CI/CD)
- Optimize for cost and latency via model routing, caching, compression strategies, and inference efficiency improvements
- Mentor peers through code reviews, architecture sessions, and hands-on technical leadership