Brillio is one of the fastest growing digital technology service providers, renowned for its capacity to seamlessly integrate cutting-edge digital and design thinking skills. They are seeking a highly experienced Senior Observability Engineer with deep expertise in Elastic Stack to lead the development of enterprise-grade observability capabilities across mission-critical applications.
Responsibilities:
- ESS Observability Architecture Implementation
- Design and implement end-to-end observability solutions using ESS (Elastic Stack)
- Build a centralized observability layer covering all MF applications
- Ensure block-level aggregation with drill-down to:Application-level metricsAPM tracesLogs and eventsService dependencies
- Dashboard Engineering (Critical Priority)Develop and scale a large backlog of ESS dashboards, including but not limited to:Cluster Health (OCP/K8s)API APM DashboardsService Health Dependency MonitoringPod Status / Restart / Scaling MetricsHTTP Status Analytics (200/400/500 trends)Transaction Processing MetricsInfra Metrics (CPU, Memory, Disk, Network)Synthetic Monitoring AvailabilityBuild intuitive, drill-down dashboards from MF Block → Service → Application level
- APM, Tracing Monitoring ExpansionExpand ESS-based:Application Performance Monitoring (APM)Distributed tracingReal User Monitoring (RUM)Synthetic monitoringEnable end-to-end traceability across microservices
- Proactive Observability AlertingDesign and implement smart alerting rules:Move from reactive → proactive detectionReduce noise, improve signal qualityDefine SLOs, SLIs, and error budgetsEnhance anomaly detection and trend analysis
- Collaboration LeadershipWork closely with:EOT Observability TeamInternal CDLsApplication teamsAct as ESS Observability SMEProvide guidance, standards, and best practices
Requirements:
- Strong hands-on experience with ESS (Elastic Stack): Elasticsearch, Logstash, Kibana, Beats / Elastic Agent, Elastic APM
- Proven experience building enterprise-scale observability dashboards in ESS
- Deep understanding of Microservices architecture
- Kubernetes / OpenShift (OCP)
- Experience with APM, distributed tracing, logging, metrics correlation
- Ability to design multi-layer observability (infra → platform → app)
- Experience with Synthetic monitoring tools integrated with ESS
- Real User Monitoring (RUM)
- Service maps and dependency graphs
- Knowledge of CI/CD observability integration
- Alerting frameworks within Elastic
- Scripting: Python / Shell / Groovy (nice to have)