Overview:
We are seeking an experienced ITOps Consultant – Monitoring & Observability to design, implement, and operate enterprise-grade monitoring solutions. This role focuses on ensuring high availability, performance, and reliability of IT infrastructure and applications through modern observability practices.
The ideal candidate will have 2–5 years of hands-on experience in monitoring and observability platforms, with OpsRamp as a primary or preferred tool. Candidates with strong experience in Datadog or Dynatrace and proven capability to integrate monitoring tools with ITSM platforms are also encouraged to apply.
This is a rotational shift role, supporting 24x7 operations.
Primary Responsibilities:
· Deploy, configure, and operate OpsRamp as the core monitoring platform, including onboarding devices, applications, and services.
· For non-OpsRamp profiles, quickly adapt and transition experience from tools like LGTM stack / Datadog / OpenText monitoring tools/NewRelic or any SAAS monitoring/observability tool into the OpsRamp ecosystem.
· Integrate monitoring platforms with IT Service Management (ITSM) tools (e.g., ServiceNow, BMC Remedy) for incident, event, and alert management.
· Develop and maintain dashboards, alerts, SLIs/SLOs, and reports to ensure proactive issue detection and faster incident resolution.
· Tune alert thresholds and correlation rules to reduce alert noise and improve signal quality.
· Support hybrid and multi-cloud environments, including on-prem, cloud, and containerized platforms.
· Collaborate with Infrastructure, Application, and DevOps teams to integrate monitoring into CI/CD pipelines and operational workflows.
· Automate monitoring, alerting, remediation, and reporting using scripts, APIs, and orchestration tools.
· Leverage AIOps capabilities for anomaly detection, event correlation, and predictive insights.
· Participate in rotational shifts and support operational monitoring, incident triage, and root cause analysis (RCA).
· Document monitoring architectures, runbooks, configurations, and standard operating procedures (SOPs).
Required Skills:
Mandatory / Core Skills
· 2–5 years of experience in IT Operations, Monitoring, or Observability roles.
· Hands-on experience with OpsRamp for monitoring deployment, configuration, and operations
o OR strong experience with LGTM stack / Datadog / OpenText monitoring tools/NewRelic or any SAAS monitoring/observability tool, with readiness to work on OpsRamp.
· Proven experience integrating monitoring tools with ITSM platforms.
· Strong understanding of metrics, logs, traces, and observability best practices.
Technical Skills
· Experience with open-source monitoring tools:
o Prometheus, Grafana
o ELK Stack (Elasticsearch, Logstash, Kibana), Fluentd
o Tracing tools such as Jaeger or Zipkin
· Working knowledge of REST APIs and API-based integrations.
· Scripting/automation experience using Python, Ansible, or similar tools.
· Familiarity with AIOps concepts, anomaly detection, and intelligent alerting.
· Understanding of ITIL processes and service management frameworks.
· Exposure to security monitoring and compliance requirements is a plus.
Soft Skills
· Strong analytical and troubleshooting skills for complex production issues.
· Ability to work effectively in 24x7 rotational shifts.
· Good communication skills and ability to work with cross-functional teams and business stakeholders.
· Ownership mindset with a focus on reliability and operational excellence.
Nice to Have
· OpsRamp certification or hands-on production deployment experience
· Experience monitoring Kubernetes / OpenShift environments