Circle is one of the world’s leading internet financial platform companies, building the foundation of a more open, global economy through digital assets and payment applications. As a Site Reliability Engineer at Circle, you'll harness AI-powered insights to build and maintain production infrastructure, ensuring reliability and performance for a rapidly expanding global customer base.

Responsibilities:

Empower agile development teams with a high-performance CI/CD pipeline, ensuring fast, high-quality releases with measurable performance and quality metrics
Design, maintain, and secure cloud infrastructure using Infrastructure-as-Code tools like Terraform and Crossplane
Automate operational tasks using Go, Python, and serverless solutions (AWS Lambda, Kubernetes Jobs)
Manage and monitor Kubernetes clusters for multiple production workloads
Develop and maintain blockchain infrastructure, managing nodes across Ethereum, Solana, Arbitrum, Base, Avalanche, and others
Ensure system reliability and security by participating in on-call rotations, troubleshooting disruptions, conducting root cause analysis, and collaborating with Security teams on security-focused tools and frameworks
Plan, test, and implement disaster recovery strategies for a highly available microservices architecture
Leverage AI-powered solutions for managing infrastructure, analyzing logs, detecting anomalies, capacity planning, maintaining predictively, and optimizing performance
Mentor and support team growth, fostering collaboration and scalability

Requirements:

4+ years in DevOps or SRE roles
3+ years in CI/CD platform development and microservices support
Strong observability, problem-solving, and performance optimization skills in complex, distributed systems
Hands-on experience with Blue-Green, Canary, and A/B Testing deployment strategies for services and databases
Understanding of multi-region and multi-cloud architectures
Proficiency in Go, Python, and Shell
Proficiency in AI tools utilization to support daily activities
Excellent communication skills—able to break down technical concepts and foster collaboration
Observability, troubleshooting, and performance optimization skills in complex, distributed systems
Experience with Kubernetes clusters at scale, containerization, and Helm charts
Experience with modern CI/CD platforms with seemingly complex gates and workflows
Experience with distributed blockchain systems and blockchain full nodes
Experience with networking (routing, DNS, load balancing, edge networking)
Experience with APM, RUM, monitoring, and telemetry tools
Experience with database technologies (PostgreSQL, Redis, OpenSearch)
Experience with migrating and transforming large, complex datasets from diverse sources, structures, and formats
Experience with data warehousing (Apache Airflow, AWS DMS, Snowflake)
Experience with IaC with Terraform or Crossplane for cloud deployments
Experience with AI tools (GitHub Copilot, Gemini, and ChatGPT) for productivity and code quality
Experience with large Language Models (LLMs) and AI applications in software development and operations
7+ years in DevOps or SRE roles (for Staff Site Reliability Engineer)
5+ years in CI/CD platform development and microservices support (for Staff Site Reliability Engineer)
Understanding macro activities and how to subdivide them, maintaining the focus (for Staff Site Reliability Engineer)
Familiarity with multi-region and multi-cloud architectures and their implementations (for Staff Site Reliability Engineer)

Senior/Staff Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: