Tabby is a financial technology company that creates financial freedom in the way people shop, earn, and save. They are seeking a Senior Observability and Monitoring Engineer to be a key part of their strategic shift towards a self-hosted observability stack, responsible for the design, evolution, and operation of monitoring, logging, and tracing platforms.
Responsibilities:
- Operate and maintain Elastic Enterprise clusters (index lifecycle, scaling, retention, access control, backups)
- Design and manage log pipelines (Fluentd / Fluent Bit / Logstash / Beats) for applications, databases, and infrastructure
- Support and gradually decommission Datadog integrations (logs, metrics, APM, RUM, dashboards, monitors)
- Collaborate with SRE, DevOps, and Security teams to ensure observability compliance and integration with SOC2 / SIEM
- Define and maintain SLIs, SLOs, and error budgets, improving service reliability and visibility
- Implement and optimize metrics and APM solutions (Prometheus, VictoriaMetrics, Mimir, etc.)
- Automate observability infrastructure with Terraform, Helm, and GitOps tools (FluxCD or ArgoCD)
- Document architecture, data flows, and best practices
Requirements:
- 4+ years in DevOps / SRE / Observability roles
- Solid experience with Elastic Stack (Elasticsearch, Logstash, Kibana, Beats, log shippers)
- Hands-on experience building or running APM / Metrics platforms (Prometheus, VictoriaMetrics, Mimir, etc.)
- Exposure to security monitoring / SIEM integration
- Strong experience in Grafana Labs tools like Loki, Tempo, etc
- Familiarity with Kafka, ClickHouse, or Grafana Loki
- Knowledge of Kubernetes and cloud platforms (GCP, OCI, or AWS)