CloudGrafanaKubernetesLinuxPrometheusSplunkAIDatadogOpenTelemetryRemote Work
About this role
Role Overview
Investigate and resolve complex technical issues across application, infrastructure, and platform layers
Analyze logs, metrics, traces, and alerts to identify root causes and correlate symptoms across systems
Troubleshoot Kubernetes and container issues, including pods, services, deployments, networking, and resources
Engage directly with enterprise banking customers to understand, investigate, and resolve technical issues
Communicate findings clearly to technical and non-technical customer stakeholders during investigations
Use tools such as Prometheus, Grafana, Loki, OpenTelemetry, Datadog, Splunk, or similar to investigate behavior
Work with the platform's AI-driven SRE capabilities — incident insights, auto-remediation, and AI-based root cause analysis — validating findings and acting on them
Improve runbooks, dashboards, alerts, and support processes based on recurring issues and lessons learned
Support Kubernetes-based banking platform environments across cloud, hybrid, and customer-hosted deployments
Requirements
Strong hands-on troubleshooting experience across production or customer-facing technical environments
Hands-on experience with Kubernetes and containers
Ability to read, interpret, and correlate logs, metrics, traces, and alerts
Strong understanding of Linux, networking fundamentals, APIs, services, and distributed system behavior
At least one proven customer-facing technical role is required
Experience working directly with customers during investigations, escalations, or production support scenarios