Autheo is a pioneering company focused on integrating blockchain technology with enterprise solutions. They are seeking a highly skilled L3 Senior Site Reliability Engineer / Cloud Engineer to design, build, and operate reliable cloud infrastructure for blockchain services and Web3 applications.

Responsibilities:

Architect, deploy, and operate highly available AWS infrastructure optimized for blockchain workloads
Implement Infrastructure as Code (IaC) using Terraform for repeatable, auditable provisioning
Manage production container platforms (EKS, ECS, Kubernetes, Docker, ECR)
Operate and optimize EC2, S3, EBS/FSx, Lambda, and related services
Design VPCs, VPNs, subnets, security groups, routing, load balancers, and network isolation
Implement IAM, KMS, Secrets Manager for identity, encryption, and key management
Apply scaling techniques for RPC endpoints (load balancing, caching, throttling) and manage public/private peer connectivity
Support and troubleshoot Amazon Linux, Oracle Linux, and Windows Server environments
Deploy, operate, and maintain blockchain nodes (full/archive/light clients) and RPC endpoints on EVM-compatible chains (Ethereum, Polygon, BNB Chain, etc.)
Optimize node performance, storage, networking, and containerization using Docker/Kubernetes
Monitor and troubleshoot blockchain health metrics (block height, peer count, sync status, logs, memory, throughput)
Support on-chain/off-chain interactions, transactions, gas fees, signing, wallets, smart contract invocations, and state queries
Troubleshoot blockchain errors (transaction failures, RPC timeouts, indexing lag, sync divergence)
Work with API gateways and middleware services (Infura, Alchemy, QuickNode equivalents)
Implement indexing for event logs, state, and transactions using tools like The Graph, ETL pipelines, custom services, or database-backed explorers
Implement Terraform, Helm, and GitOps workflows for infrastructure lifecycle management
Enforce resilient, automated, scalable design patterns and collaborate on faster, higher-quality deployments
Own availability, latency, performance, capacity, SLOs/SLIs/SLAs with observability-driven insights
Lead on-call rotations, incident response for S1/S2 events, post-incident reviews, and preventive initiatives
Reduce operational toil through automation; own and build CI/CD pipelines (Jenkins, GitHub Actions), Terraform validation, Docker builds, Helm deployments
Instrument blockchain workloads for metrics, logs, traces, predictive signals, and anomaly detection using Datadog, Prometheus, Grafana, ELK, CloudWatch, OpenTelemetry, Wazuh
Build automated alerting, anomaly detection, diagnostics, and end-to-end observability strategies
Implement AIOps for event correlation, anomaly detection, predictive diagnostics, automated remediation, and self-healing (using AWS SageMaker, Bedrock, and other AI tools)
Drive security threat detection/prioritization, capacity planning, forecasting, cost control, and reporting
Enforce cloud security best practices, vulnerability remediation pipelines, and compliance guardrails (SOC2, PCI, ISO27000)
Manage cryptographic materials, KMS/HSM, wallet abstractions (HD, custodial/non-custodial, multisig)

Requirements:

7+ years in Cloud, SRE, Systems, or DevOps Engineering roles
5+ years operating production workloads on AWS
3+ years supporting blockchain infrastructure, nodes, Web3 applications, DeFi, etc
Strong hands-on experience with AWS services (EC2, EKS, ECS, S3, RDS/Aurora, VPC/VPN, Route53, ALB/NLB, KMS, IAM, Secrets Manager, Lambda, EventBridge, CloudWatch, ECR)
Production experience with containers & Kubernetes
Proficiency with IaC (Terraform, Helm, AWS CDK) and automation/scripting (Python, Bash, or Go preferred)
Working experience with CI/CD (GitHub Actions, Jenkins, Argo, etc.)
Demonstrated experience with observability systems (Datadog, Prometheus, OpenTelemetry, ELK, CloudWatch, Wazuh)
Practical exposure to AIOps concepts (event correlation, predictive diagnostics, anomaly detection, automated response)
Experience supporting 24×7 on-call rotation for production services
Strong understanding of distributed systems, reliability patterns, and fault tolerance
Experience participating in major incident response and post-incident reviews
AWS Certifications (Solutions Architect, DevOps Engineer, SysOps Administrator)
Deep experience with blockchain, Web3, or decentralized system operations
Proven SRE methodology experience, including automation, CI/CD, and IaC development
Experience in compliance-driven environments (SOC2, PCI, ISO27000)

L3 Senior Site Reliability Engineer / Cloud Engineer with Strong Blockchain Experience

Key skills

About this role

Responsibilities:

Requirements: