Act as the primary technical point of contact when CX escalates customer-impacting issues, translating business impact into clear technical problem statements.
Triage incoming incidents, assess severity and urgency, and communicate status updates across stakeholders in a clear and timely manner.
Manage the incident lifecycle from initial report through resolution, communicating status updates clearly to relevant stakeholders (CX, Product, Engineering).
Develop and maintain comprehensive runbooks and knowledge base articles for common issues and standard operational procedures.
Troubleshoot and debug complex production issues utilizing logging platforms (e.g., Splunk, ELK stack), monitoring tools (e.g., Datadog, Prometheus, Grafana), and database query tools (SQL, NoSQL) to diagnose the root cause of problems.
Perform code-level analysis when necessary to pinpoint defects or architectural weaknesses contributing to production instability.
Collaborate effectively with Product Development teams to prioritize, document, and hand off confirmed bugs and large-scale systemic issues for permanent resolution.
Act as the escalation point for the CX team when issues require deeper technical investigation or coordination with engineering teams.
Requirements
2+ years of experience in a Production Support, Application Support, Technical Operations, Site Reliability Engineering (SRE), Support Helpdesk or Engineering role focused on production system operations.
Hands-on experience using monitoring and observability platforms to investigate live incidents. (e.g., Splunk, Datadog, ELK).
Solid experience with database systems, including the ability to write and execute complex SQL queries for data analysis and issue resolution.
Experience coordinating between CX or non-technical teams and engineering, comfortable with technical and non technical communication.
Proficiency in at least one scripting language (e.g., Python, Bash) for automation and ad-hoc analysis.
Bonus: Experience with incident management frameworks (e.g. PagerDuty, OpsGenie) and platforms such as Jira Service Management or Zendesk.
Tech Stack
Grafana
NoSQL
Prometheus
Python
Splunk
SQL
Benefits
Long-term incentive plan with a company performance-based cash payout
Pension plan
Private medical insurance
25 days PTO
Meal allowance and flexible compensation plan (transport and nursery)
Gym membership
€450 to cover the costs associated with the adoption of a pet
Annual €150 wellness reimbursement
Flexible work hours, sometimes you'll need to be in at certain times, but on the whole, we're pretty flexible when it comes to managing workload and time
Grab snacks, fresh fruit, in our kitchen to keep yourself going
Regular team activities, events, game nights, and more