About this role
AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.
WHY JOIN US
If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you!
ABOUT THE ROLE
We are looking for an
SRE Operations Engineer
to keep production and staging environments running reliably across a cloud-based SaaS platform. You’ll respond to live incidents, reduce operational toil through automation, and improve observability using Kubernetes, Terraform, Grafana, and AWS. A hands-on role with real ownership across CI/CD pipelines, GitOps workflows, and on-call rotations.
WHAT YOU WILL DO
- Monitor and support production and staging environments in real time, ensuring high availability, performance, and stability;
- Respond to incidents, perform triage and root cause analysis, and contribute to post-incident reviews and remediation efforts;
- Participate in an on-call rotation with defined SLAs;
- Handle ad-hoc and unplanned operational requests from Product, Support, and internal teams;
- Maintain and enhance monitoring, alerting, dashboards, logs, and metrics, and improve observability practices;
- Support CI/CD pipelines, production releases, and GitOps workflows;
- Contribute to automation efforts to reduce operational toil;
- Maintain and improve Kubernetes-based infrastructure and containerized workloads;
- Support Infrastructure as Code practices and ongoing environment improvements.
MUST HAVES
-
2+ years of experience
in
Site Reliability Engineering
,
DevOps
, or
Production Operations
;
- Experience with
AWS
supporting production environments;
- Experience supporting
production SaaS applications
;
- Strong understanding of
CI/CD systems
such as
GitHub Actions
,
Jenkins
, or
CircleCI
;
- Experience with
GitOps
and strong
Git
fundamentals;
- Experience using
GitHub
,
Jira
, and
Confluence
in collaborative environments;
- Experience with
Kubernetes
such as
EKS
or
kOps
;
- Experience with
Docker
and containerization;
- Experience with
observability tools
such as
Grafana
,
Prometheus
,
Loki
, or
PagerDuty
;
- Experience with
scripting languages
such as
Bash
,
Python
, or
Go
;
- Experience with
Infrastructure as Code
such as
Terraform
or
Helm
;
- Ability to work within structured operational processes and SLAs;
- Strong written and verbal English communication skills;
- Self-driven with a growth mindset.
NICE TO HAVES
- AWS certifications such as Solutions Architect, DevOps Engineer, or SysOps Administrator;
- Experience in multi-tenant SaaS environments;
- Experience working in globally distributed teams;
- Familiarity with ChatOps practices;
- Experience improving monitoring quality and reducing alert fatigue.
PERKS AND BENEFITS
-
Professional growth:
Mentorship, TechTalks, and personalized growth roadmaps.
-
Competitive compensation:
USD-based pay with education, fitness, and team activity budgets.
-
Exciting projects:
Modern solutions with Fortune 500 and top product companies.
-
Flextime:
Flexible schedule with remote and office options.