Own the ongoing configuration, health, and governance of the shared Datadog tenant — maintaining tagging standards, RBAC, agent configurations, log pipelines, monitors, and dashboards as the portfolio grows and the platform evolves.
Lead the structured onboarding of product teams onto the shared observability standard, working directly with DevOps leads to deploy agents, configure APM instrumentation, establish baseline monitors, and validate compliance with platform standards.
Plan and execute migrations of product teams from fragmented or non-standard observability tooling onto the shared Datadog platform, prioritizing by product criticality and coordinating with product teams to minimize disruption.
Drive APM instrumentation across the portfolio, working with development teams to instrument applications for distributed tracing using Datadog APM libraries appropriate to each runtime.
Support product teams in progressing through the observability maturity framework — from baseline instrumentation through operational monitoring and SLO definition — providing hands-on guidance, runbooks, and training to build self-sufficiency.
Collaborate with operational teams to ensure alert routing, escalation paths, and incident management integrations are correctly configured and maintained across all onboarded products.
Monitor Datadog log ingestion volumes, custom metric cardinality, and index usage across the shared tenant. Enforce retention policies and exclusion filters to keep costs predictable as adoption scales.
Contribute to the continuous improvement of the observability platform — identifying gaps, proposing enhancements, and working with the architecture team to implement improvements that benefit the full portfolio.
Serve as the observability lead within the Cloud Center of Excellence Community of Practice — running enablement sessions, publishing best practices, and acting as the primary point of contact for observability questions across the organization.
Requirements
5+ years of experience in DevOps, Platform Engineering, SRE, or Observability Engineering
Deep hands-on expertise with Datadog across the full product surface: Infrastructure Monitoring, APM, Log Management, Synthetics, Dashboards, Monitors, and Alerts
Strong experience operating shared Datadog tenants at enterprise scale, including tagging strategy, RBAC, and multi-team configuration management
Proficiency with Infrastructure as Code using Terraform, including the Datadog Terraform provider for monitors, dashboards, and log pipelines managed as code
Hands-on experience deploying the Datadog agent via Kubernetes (Helm DaemonSet), Ansible, and cloud-native approaches for secure API key management
Familiarity with GitHub Actions and experience integrating observability tooling into CI/CD pipelines
Strong understanding of structured logging standards, distributed tracing (W3C TraceContext), and APM instrumentation patterns across common runtimes (Java, .NET, Python, Node.js)
Experience with PagerDuty integration, on-call schedule management, and alert routing design
Demonstrated ability to work directly with product and engineering teams in an enablement and advisory capacity
Ability to work effectively in a remote environment with strong self-direction and communication skills
Excellent English communication skills for collaboration with US-based leadership, global team members, and product engineering teams.
Tech Stack
Ansible
Cloud
Java
JavaScript
Kubernetes
Node.js
Python
Terraform
.NET
Benefits
Background checks are required for employment with insightsoftware, where permitted by country, state/province.
All your information will be kept confidential according to EEO guidelines.
We empower leaders from over 32,000 organizations to make timely and intelligent decisions.
Our comprehensive solutions span Financial Planning and Analysis (FP&A), Controllership, and Data and Analytics.
We deliver finance teams the insights required to navigate any economic climate and drive greater financial intelligence, while increasing productivity, visibility, accuracy, and compliance.