EverCommerce is a company focused on modernizing cloud infrastructure and deployment pipelines. They are seeking a Lead DevOps Engineer to create and manage an automated platform for development and operations teams, ensuring best practices for security, compliance, and scalability.
Responsibilities:
- Design, deploy, and manage AWS ECS-based containerized workloads using Terraform and Spacelift
- Build and optimize self-service infrastructure platforms with Backstage, enabling development teams to deploy services autonomously
- Implement best practices for observability, security, and reliability across cloud environments
- Develop and manage GitHub Actions workflows for automated testing, security scanning, and deployments
- Standardize CI/CD pipelines and release automation processes across teams
- Improve deployment strategies to ensure zero-downtime deployments and infrastructure immutability
- Automate server and container configurations using Ansible
- Develop repeatable, scalable, and version-controlled infrastructure patterns
- Support developers with automated service provisioning and self-service tools
- Embed security and compliance controls into infrastructure and workflows
- Implement role-based access control (RBAC), policy enforcement, and infrastructure security best practices
- Ensure auditability and traceability in infrastructure changes using GitOps methodologies
- Implement observability solutions, including logging, monitoring, and alerting for platform services
- Define SLAs, SLOs, and on-call runbooks to ensure high availability and reliability
- Support production and non-production environments through proactive incident resolution and root cause analysis
Requirements:
- Proven experience in designing, migrating, and managing AWS ECS-based containerized environments
- Deep expertise in Terraform for IaC, with experience in Spacelift.io or similar policy-as-code automation tools
- Hands-on experience with GitHub Actions for CI/CD automation
- Strong knowledge of Backstage.io for developer portal and self-service infrastructure
- Experience with Ansible for configuration management and automation
- Self-service and everything-as-code mindset – experience designing repeatable, fully automated infrastructure patterns
- Strong understanding of networking, IAM policies, secrets management, and cloud security best practices
- Experience with monitoring and logging solutions (e.g., CloudWatch, NewRelic)
- Ability to troubleshoot performance, availability, and scaling issues in containerized and cloud-native environments
- Experience with service mesh technologies (e.g., Istio, Linkerd, or AWS App Mesh)
- Familiarity with FinOps and cost optimization in AWS environments
- Knowledge of SRE principles, SLAs, and error budgets
- Experience with policy-as-code tools like Open Policy Agent (OPA) or HashiCorp Sentinel