MLabs is a company operating at the intersection of decentralized finance and artificial intelligence, seeking a Senior DevOps / SRE Engineer. The role involves managing infrastructure for autonomous AI trading agents, ensuring reliability and zero-downtime operations in high-stakes financial environments.

Responsibilities:

Agent Infrastructure Management: Build and maintain the infrastructure for concurrent AI trading agents, managing complex cron schedules, state files, and trailing stop processes
Deployment & Orchestration: Deploy and manage agent environments, including workspace persistence, isolated session management, and Model Context Protocol (MCP) server connectivity
CI/CD Pipeline Development: Design and operate pipelines for shipping trading skills and plugins to production without interrupting live trading activity
Zero-Downtime Operations: Execute deployment strategies (blue/green, canary) ensuring active financial positions remain protected during every infrastructure change
Observability & Monitoring: Build comprehensive alerting across the full stack using metrics, logs, and traces to detect agent failures, state file corruption, or infrastructure regressions before financial loss occurs
Cloud & Database Scaling: Operate and scale core platform infrastructure, including Kubernetes (EKS) clusters, Redis, Postgres, ClickHouse, and Kafka
Blockchain Reliability: Maintain blockchain node infrastructure and ensure stable connectivity to exchange APIs and on-chain transaction systems
Incident Leadership: Lead incident response and on-call practices, including debugging, mitigation, and post-mortems to improve long-term platform reliability

Requirements:

Extensive experience in DevOps, SRE, or Infrastructure Engineering, preferably within a startup environment where systems were built from the ground up
Proven track record of deploying, scaling, and debugging production workloads, specifically within AWS EKS
Proficiency with tools such as Terraform, Ansible, or equivalent frameworks
Hands-on experience with Docker and Helm for packaging production services
Experience operating production-grade data and messaging systems (Redis, Postgres/RDS, ClickHouse, Kafka)
Strong experience with Prometheus, Grafana, Datadog, Loki, or OpenTelemetry to build proactive operational visibility
Ability to debug across multiple languages, including Python, Node.js, and Go
Understanding of systems where latency and reliability have direct financial consequences
Familiarity with node infrastructure, exchange APIs, wallet operations, and on-chain monitoring
Experience managing secrets, access controls, and production hardening for sensitive financial environments
Experience defining SLOs and building mature on-call practices
Experience with OpenClaw agent deployments and workspace templates
Familiarity with Model Context Protocol (MCP) server deployment and auth management
Direct experience with Hyperliquid or other decentralized exchange (DEX) protocols
Background in fintech, market data infrastructure, or high-frequency trading systems

Senior DevOps / SRE Engineer

Key skills

About this role

Responsibilities:

Requirements: