TRM Labs provides blockchain analytics and AI solutions to help law enforcement and national security agencies, financial institutions, and cryptocurrency businesses detect, investigate, and disrupt crypto-related fraud and financial crime. As a Senior MLOps Engineer focused on LLMOps, you will build and scale the technical infrastructure for AI/ML systems, enabling rapid deployment and integration of AI models into applications.

Responsibilities:

Build reusable CI/CD workflows for model training, evaluation, and deployment — integrating Langfuse, GitHub Actions, and experiment tracking, etc
Automate model versioning, approval workflows, and compliance checks across environments
Build out a modular and scalable AI infrastructure stack — including vector databases, feature stores, model registries, and observability tooling
Partner with engineering and data science to embed AI models and agents into real-time applications and workflows
Continuously evaluate and integrate state-of-the-art AI tools (e.g. LangChain, LlamaIndex, vLLM, MLflow, BentoML, etc.)
Drive AI reliability and governance, enabling experimentation while ensuring compliance, security, and uptime
Build and enhance AI/ML Model Performance
Ensure data accuracy, consistency and reliability, leading to better model training and inferencing
Deploy infrastructure to support offline and online evaluation of LLMs and agents — including regression testing, cost monitoring, and human-in-the-loop workflows
Enable researchers to iterate quickly by providing sandboxes, dashboards, and reproducible environments

Requirements:

Write high-quality, maintainable software — primarily in Python, but we value engineering ability over language familiarity
Have a strong background in scalable infrastructure, including: Containerization and orchestration (e.g. Docker, Kubernetes), Infrastructure-as-code and deployment (e.g. Terraform, CI/CD pipelines), Monitoring and logging frameworks (e.g. Datadog, Prometheus, OpenTelemetry)
Understand and implement ML Ops best practices, including: Model versioning and rollback strategies, Automated evaluation and drift detection, Scalable model and agent serving infrastructure (e.g. vLLM, Triton, BentoML)
Deploy and maintain LLM and agentic workflows in production, including: Monitoring cost, latency, and performance, Capturing traces for analysis and debugging, Optimizing prompt/response flows with real-time data access
Demonstrate strong ownership and pragmatism, balancing infrastructure elegance with iterative delivery and measurable impact

Senior MLOps Engineer – LLMOps

Key skills

About this role

Responsibilities:

Requirements: