Career Renew is recruiting for a Senior DevOps / SRE Engineer - Trading for one of its clients. The role involves owning and maintaining the infrastructure that supports autonomous AI trading agents, ensuring reliability, security, and performance at scale.

Responsibilities:

Build and maintain the infrastructure that runs dozens of concurrent AI trading agents per user — each with their own cron schedules, state files, and trailing stop processes
Deploy and manage OpenClaw agent environments, including workspace persistence, cron orchestration, isolated session management, and MCP server connectivity
Design and operate CI/CD pipelines for shipping trading skills, plugins, and agent updates to production without interrupting live trading
Define and execute deployment strategies for production systems, including zero-downtime rollouts, safe rollback mechanisms, and release reliability for live trading workloads
Ensure zero-downtime deployments — active positions must remain protected through every infrastructure change
Build monitoring, alerting, and observability across the full stack using metrics, logs, traces, and dashboards that catch agent failures, orphaned positions, state file corruption, infrastructure regressions, and MCP auth expiration before they cost money
Manage cloud infrastructure across multiple environments with infrastructure-as-code
Operate and scale core platform infrastructure including Kubernetes/EKS clusters, containerized workloads, Redis, Postgres/RDS, ClickHouse, Kafka, and blockchain-adjacent services
Operate blockchain node infrastructure and ensure reliable connectivity to Hyperliquid APIs, on-chain transaction systems, and wallet operations
Own logging, observability, security, and incident response across the full agent stack
Lead incident response and on-call practices across the platform, including debugging, mitigation, postmortems, and long-term reliability improvements
Own backup, recovery, and disaster-readiness for critical infrastructure and trading-supporting systems

Requirements:

Professional DevOps, SRE, or infrastructure engineering experience, ideally in a startup where you built systems from scratch rather than only maintaining existing systems
Strong Kubernetes experience — deploying, scaling, and debugging production workloads, ideally on AWS EKS
Hands-on experience with Docker and Helm for packaging and operating production services
Proficiency with infrastructure-as-code such as Terraform, Ansible, or equivalent
Experience with CI/CD and deployment automation using GitHub Actions, ArgoCD, or similar systems
Strong AWS infrastructure experience; multi-cloud experience is a plus
Experience operating production data and messaging systems such as Redis, Postgres/RDS, ClickHouse, and Kafka
Strong observability experience with Prometheus, Grafana, Datadog, Loki, ELK/OpenSearch/Kibana, OpenTelemetry, or equivalent tooling
Ability to build dashboards, alerts, and operational visibility that surface problems before they escalate
Ability to debug across languages such as Python, Node.js, and Go — you'll be tracing issues through agent scripts, MCP servers, platform services, and infrastructure
Experience owning security-related infrastructure concerns such as access management, secrets handling, production hardening, and operational controls
Experience with incident management, on-call operations, and backup/recovery planning for production systems
Understanding of real-time systems where latency and reliability directly impact financial outcomes — cron jobs that must fire on schedule, state files that cannot corrupt, and atomic operations under concurrent load
Experience designing deployment strategies for systems that cannot tolerate interruption during live financial activity
Familiarity with blockchain or node infrastructure, exchange APIs, wallet operations, and on-chain monitoring
Experience with or willingness to learn MCP (Model Context Protocol) server deployment, auth management, and the agent-to-tool connectivity layer
Hyperliquid experience is a plus, but not required
Experience with OpenClaw, including agent deployments, workspace templates, cron systems, environment management, and session orchestration
Experience with multi-agent systems — orchestrating many independent processes that share infrastructure but operate autonomously
Background in trading systems, market data infrastructure, blockchain infrastructure, or fintech DevOps where uptime has direct financial consequences
Experience defining SLOs, improving operational maturity, and building reliable on-call practices in fast-moving production environments

Senior DevOps / SRE Engineer - Trading

Key skills

About this role

Responsibilities:

Requirements: