AWSDockerDynamoDBGrafanaKubernetesServiceNowSplunkLambdaS3CloudWatchDynatraceCommunicationRemote Work
About this role
Role Overview
Provide incident response within defined SLAs, troubleshoot production issues, and perform root cause analysis.
Monitor and maintain observability using Splunk, CloudWatch, Zabbix, and similar tools.
Investigate issues across AWS services, networking, APIs, and integrations.
Manage Amazon Connect configurations, contact flows, bots (Lex), and integrations with Lambda, S3, QuickSight, and DynamoDB.
Develop visual process flows, standardized troubleshooting playbooks, and how-to guides for support teams.
Document alert resolution steps and maintain runbooks, knowledge repositories, and playbooks.
Analyze ServiceNow incidents, RCAs, and historical events to extract actionable insights for documentation.
Collaborate with platform and operations teams for incident triage, mock troubleshooting sessions, and continuous improvement.
Requirements
4+ years of experience in Production Support, NOC, or Site Reliability Engineering roles.
Minimum 3 years of hands-on experience with Amazon Connect (CCaaS).
Strong knowledge of AWS services including Amazon Connect, Lambda, S3, DynamoDB, and CloudWatch.
Proficiency with monitoring and logging tools such as Splunk, CloudWatch, Dynatrace, Zabbix, and Grafana.
Solid understanding of SLIs, SLOs, and core reliability engineering practices.
Excellent communication abilities and strong documentation skills.
Nice to have: AWS certifications.
Experience with Docker/Kubernetes.
Knowledge of CCaaS workflows and compliance frameworks (HIPAA, SOC-II).
Tech Stack
AWS
Docker
DynamoDB
Grafana
Kubernetes
ServiceNow
Splunk
Benefits
Culture of Relentless Performance : join an unstoppable technology development team with a 99% project success rate and more than 30% year-over-year revenue growth.
Competitive Pay and Benefits : enjoy a comprehensive compensation and benefits package, including health insurance, language courses, and a relocation program.
Work From Anywhere Culture : make the most of the flexibility that comes with remote work.
Growth Mindset : reap the benefits of a range of professional development opportunities, including certification programs, mentorship and talent investment programs, internal mobility and internship opportunities.
Global Impact : collaborate on impactful projects for top global clients and shape the future of industries.
Welcoming Multicultural Environment : be a part of a dynamic, global team and thrive in an inclusive and supportive work environment with open communication and regular team-building company social events.
Social Sustainability Values : join our sustainable business practices focused on five pillars, including IT education, community empowerment, fair operating practices, environmental sustainability, and gender equality.