TrueFoundry is building foundational infrastructure for production AI systems. The Senior AI/ML Engineer will design and own core components that enable enterprise customers to run production agentic AI safely and efficiently, focusing on orchestration, model logic, and observability.
Responsibilities:
- Architect and implement scalable agent orchestration patterns (graph-based executors, state management, multi-agent coordination) for production workloads
- Own critical integrations: model adapters, LLM gateway hooks, vector DBs, tools & external APIs, and the platform’s LLMops flows
- Build and improve tracing, benchmarking and observability for LLMs and agents — token/cost accounting, latency p95, throughput, and correctness checks
- Drive design for safety/guardrails: moderation hooks, human-in-the-loop checkpoints, replayable audit trails and policy enforcement
- Mentor junior engineers, run design reviews, and improve engineering practices (testing, CI/CD, chaos testing for agents)
- Work directly with strategic customers to prototype complex agentic solutions and translate them into product features
Requirements:
- 4–9 years of software engineering with substantial experience building distributed systems, infra, or ML platforms
- Deep practical experience integrating and deploying LLMs in production (RAG, retrieval, embeddings pipelines)
- Hands-on experience with agent orchestration frameworks (LangGraph / LangChain or custom agent runtimes) and stateful workflow design
- Strong systems knowledge: Kubernetes, container orchestration, service meshes, and performance tuning
- Proven track record building observability, cost controls, and policy enforcement for production services
- Experience building or contributing to open-source LLM orchestration tools (LangGraph, LangChain, or similar)
- Familiarity with enterprise constraints: on-prem/cloud hybrid deployments, data residency, compliance requirements
- Background in security, privacy, or model governance for LLMs
- Demonstrated leadership in cross-functional projects and direct customer engagement