Leads a team responsible for enterprise observability platforms and core development tooling, enabling fast detection, diagnosis, and resolution of production issues
Owns the reliability, security, scalability, performance, and instrumentation of CI/CD and developer platforms
Partners with engineering, SRE/Operations, and Security to embed observability and operational excellence across the software delivery lifecycle
Hire, coach, set priorities, and build a culture of reliability, ownership, learning, and continuous improvement
Define vision/roadmap and standards for metrics, logs, traces, alerting, dashboards, and service health
Provide governance and strategic oversight for Azure DevOps, GitHub Enterprise, Jenkins, and related tooling
Drive reliability & incident excellence
Manage platforms via Infrastructure as Code; standardize configurations and operational practices
Ensure auditability, access controls/reviews, and logging meet enterprise requirements
Convert technical information into business value and coordinate stakeholders across teams
Requirements
Experience leading engineering teams in Observability/SRE/Platform/DevOps Tools
Hands-on background with observability platforms (e.g., Datadog, Splunk, New Relic, Grafana, OpenTelemetry)