Administer the monitoring infrastructure ( SCOM / Checkmk / Elastic ), ensuring that it is stable, up to date, well designed, properly tuned and properly maintained.
Extend the current infrastructure and /or implement a new infrastructure following capacity management plans.
Develop monitoring, improve/configure/maintain.
Develop integration through API’s.
Develop ad-hoc monitors when required.
Develop SCOM management packs.
Develop and maintain automations using ansible.
Develop reports.
Identify and troubleshoot known problems and document solutions.
Provide guidance to Tier 1 workforce.
Provide guidance and coach team members when needed.
Create relevant statistics for the several monitoring tools.
Perform required tasks in maintenance windows.
Provide “stand-by” services on a rotation basis during weekends, holidays and outside of normal working hours.
Perform other duties as required
Requirements
Good experience with SCOM, Checkmk, Elastic Observability (Elasticsearch & APM) Monitoring tools.
Good scripting knowledge in either Bash, PowerShell, Python etc…
Programming skills in .NET, C#, Python
Proficient in English
Knowledge of other Monitoring tools like OEM or Prometheus are also desirable.
Experience in Full-Stack Observability platform also desirable.
Experience with automation tools and runbooks like ansible, YAML also desirable.
Experience with KQL, ESQL, PromQL, JSON and dashboards in Kibana also desirable.
Experience with reports, SSRS, PowerBI also desirable.
DevOps practice is desirable.
Required Soft Skills :
Customer facing experience and oral communication skills
Ability to write documentation & reports
Creativity/ ability to find innovative solutions
Willingness to learn on the job
Conflict management & cooperation
Experience and understanding complex enterprise environments
Required certifications: Any of following certifications Elasticsearch engineers Elastic Observability Engineer System Monitoring with Checkmk System Center Operations Manager
Tech Stack
Ansible
ElasticSearch
Prometheus
Python
.NET
Benefits
Teleworking Option: Yes
On-call requirements: One week per month (rotation is subject to the number of team members)