Amtex Systems Inc is seeking a Cloud Engineer responsible for cloud operations, reliability, and ServiceNow integration. The role involves ensuring stable cloud environments through monitoring, automation, and collaboration with DevOps teams.
Responsibilities:
- Set up and manage: Monitoring & alerting
- Centralized logging
- Backup & archival solutions
- Implement SRE practices: SLIs / SLOs
- Error budgets
- Reliability playbooks
- Execute disaster recovery planning & testing
- Ensure high availability and resilience
- Implement: Automated compliance checks
- Incident response automation
- Build runbooks and incident workflows
- Support observability platform (metrics, logs, tracing)
- Integrate ServiceNow with: AWS Identity Center (access management)
- AWS Control Tower AFT (account vending)
- Work jointly with TCH team and Architects to implement integrations
- Support operationalization of access and provisioning workflows
- Work closely with DevOps (DevSecOps) Engineer on: Guardrails implementation
- Automation pipelines
- Coordinate with Architect and stakeholders
Requirements:
- Responsible for cloud operations, reliability (SRE), and ServiceNow integration, ensuring stable, observable, and well-governed cloud environments
- Set up and manage: Monitoring & alerting, Centralized logging, Backup & archival solutions
- Implement SRE practices: SLIs / SLOs, Error budgets, Reliability playbooks
- Execute disaster recovery planning & testing
- Ensure high availability and resilience
- Implement: Automated compliance checks, Incident response automation, Build runbooks and incident workflows
- Support observability platform (metrics, logs, tracing)
- Integrate ServiceNow with: AWS Identity Center (access management), AWS Control Tower AFT (account vending)
- Work jointly with TCH team and Architects to implement integrations
- Support operationalization of access and provisioning workflows
- Work closely with DevOps (DevSecOps) Engineer on: Guardrails implementation, Automation pipelines
- Coordinate with Architect and stakeholders
- AWS monitoring tools (CloudWatch, CloudTrail)
- SRE concepts (SLOs, SLIs, DR, HA)
- Observability tools (Datadog, Prometheus, etc.)
- Incident management tools
- Backup & DR strategies
- ServiceNow integration experience (preferred/desired)