Elios Talent is seeking a Senior DevOps / SRE Engineer to ensure platform reliability and manage CI/CD pipelines in a high-scale production environment. The role involves close collaboration with engineering teams to build and maintain cloud-native infrastructure, focusing on Kubernetes and reliability engineering best practices.

Responsibilities:

Design, build, and maintain CI/CD pipelines using reusable GitHub Actions workflows
Own GitOps workflows using ArgoCD, managing application promotion across environments
Operate and upgrade Kubernetes clusters (EKS), including node groups, autoscaling, and cluster add-ons
Manage infrastructure as code using Terraform, including PR-driven workflows and state management
Define and maintain SLOs, alerting strategies, and observability dashboards across platform services
Operate and maintain secrets management systems (HashiCorp Vault), including policies and authentication
Implement supply chain security controls including image scanning, signing, SBOM generation, and policy enforcement
Partner with security teams on network policies, egress controls, and compliance requirements
Participate in on-call rotations and lead incident response and post-incident reviews

Requirements:

6+ years of experience in DevOps, SRE, Platform Engineering, or Production Operations
Strong experience managing CI/CD pipelines, GitOps workflows, and Kubernetes in production environments
Experience operating and scaling Kubernetes clusters (EKS preferred)
Expertise in infrastructure as code (Terraform), including state management and automated deployment workflows
Proven experience implementing observability and reliability practices (SLOs, alerting, dashboards, incident response)
Experience with secrets management systems such as HashiCorp Vault
Strong collaboration skills with the ability to support multiple engineering teams
Kubernetes (cluster operations, autoscaling, RBAC, workload isolation, upgrades)
GitOps (ArgoCD configuration, sync policies, rollback strategies)
CI/CD (GitHub Actions, reusable workflows, deployment gates, secrets management)
Terraform (modular design, state management, Atlantis workflows)
Observability tools (Prometheus, Grafana, Loki, Tempo, Alertmanager)
Service mesh (Istio, mTLS, traffic management, authorization policies)
Autoscaling and provisioning (KEDA, Karpenter)
Secrets management (HashiCorp Vault)
Container and supply chain security (Trivy, Cosign, SBOMs, OPA/Gatekeeper)
Scripting and automation (Python, Bash)
Experience leveraging AI tools to accelerate infrastructure development, CI/CD workflows, and operational processes
Familiarity with AI-assisted incident response, log analysis, and runbook generation
Ability to integrate AI-driven quality and security checks into delivery pipelines
Strong ownership mindset over reliability, scalability, and system performance
Focus on automation and eliminating manual operational work
Ability to proactively identify and address reliability risks
Clear and structured communication during incidents and operational events

Senior DevOps / SRE Engineer

Key skills

About this role

Responsibilities:

Requirements: