Lead advanced troubleshooting, root cause analysis and blameless post-mortems — driving structural change afterwards, not just producing a report;
Implement automation using Infrastructure as Code;
Analyze and optimize cloud costs: rightsizing, usage analysis and proposing data-driven alternatives;
Act as a technical reference for developers and engineers, influencing architecture without relying on formal authority.
Requirements
Required: production experience operating core primitives in AWS (~4+ years): EC2, VPC/networking, IAM and security — production operation and technical decision-making;
Linux and networking (~4+ years): server administration and production troubleshooting — disk full, OOM killer, network diagnostics; processes, memory and I/O;
CI/CD built from scratch (~3+ years): pipelines created and evolved by you (GitHub Actions, Jenkins, self-hosted runners, secrets, caching, gates);
End-to-end open-source observability (~2+ years): Prometheus, Grafana, Loki, VictoriaMetrics or equivalents — configured and operated by you, not just used. OpenTelemetry — including instrumentation, exporters, PromQL and SLI/SLO definition;
Operation under managed layers: concrete experience with nginx/HAProxy/Envoy, Linux underneath, and leading the resolution of critical incidents you have driven;
Docker in production (~3+ years): real operation of containers in critical environments — volumes, networking, resource management, graceful shutdown of services;
High autonomy: receives an ambiguous problem ("our observability is weak") and delivers end-to-end;
Ownership and proactivity: anticipates problems before they become incidents;
Clear communication and technical influence, connecting development, infrastructure and business teams;
Conducts post-mortems focused on root cause, organizational learning and continuous improvement, without a blame culture;
Maturity to self-manage while working remotely.
Tech Stack
AWS
Cloud
Docker
EC2
Grafana
HAProxy
Jenkins
Linux
NGINX
Prometheus
Benefits
💻 Equipment allowance – to ensure a comfortable work setup;
💚 Health support – because your well-being matters;
📚 Education support – we support your continuous development journey;
🎂 Birthday gift – because we like to celebrate together;
🏅 Recognition for tenure – your time with us is valued;
🗣️ Language support – to help you go beyond borders;