Accenture is a leading global professional services company, and they are seeking a Site Reliability Engineer with expertise in Dynatrace. The SRE will be responsible for platform monitoring, reliability, and operational resilience across the Cloud Core ecosystem, focusing on ensuring availability, performance, observability, and incident response readiness in a highly regulated financial services environment.
Responsibilities:
- Hands-on work in Dynatrace, including end-to-end configuration, dashboard creation, alert tuning, and root cause analysis
- Design, implement, and operate end-to-end monitoring and observability for Cloud Core platforms, including core banking, integrations, and supporting services
- Define and manage SLIs, SLOs, and error budgets aligned to business-critical banking services
- Monitor platform health across availability, latency, throughput, and error rates, proactively identifying reliability risks
- Lead and support incident management, including triage, root cause analysis (RCA), and post-incident reviews
- Partner with application, integration, data, and infrastructure teams to embed reliability into system design and delivery
- Automate operational tasks and monitoring workflows to reduce manual intervention and mean time to recovery (MTTR)
- Support release readiness and change management, ensuring observability and rollback considerations are in place before production deployments
- Establish dashboards and reporting for operational visibility across technology and business stakeholders
Requirements:
- Minimum of 3 years of experience with Dynatrace
- Minimum of 3 years of hands-on experience with cloud-native platforms (Azure preferred; AWS/GCP acceptable)
- Strong understanding of distributed systems, microservices, and event-driven architectures
- Experience supporting mission-critical platforms with high availability requirements
- Monitoring & Observability
- Hands-on expertise in Dynatrace, including end-to-end configuration, dashboard creation, alert tuning, and root cause analysis
- Proven experience with monitoring and observability tools such as Dynatrace, Azure Monitor, App Insights, Prometheus, Grafana, Splunk, Datadog, or equivalent
- Experience designing actionable alerts and reducing alert fatigue
- Strong understanding of logs, metrics, and traces and how to correlate them during incidents