Dayforce is a global human capital management company that offers a unified Cloud HCM platform. As a Lead Observability Engineer, you will provide senior technical leadership in the implementation and continuous improvement of Dayforce’s observability platform, ensuring reliable telemetry collection and operational workflows across distributed systems.

Responsibilities:

Design, implement, and operate components of the Dayforce observability platform in alignment with architectural standards and platform strategy
Lead implementation, tuning, and operational improvements across observability tooling including metrics, logs, traces, dashboards, alerting, and synthetic monitoring
Apply best practices for telemetry collection and instrumentation across application and infrastructure workloads
Build, maintain, and enhance dashboards and alerting mechanisms to support service ownership and incident response
Enable and onboard engineering and infrastructure teams to drive consistent adoption and effective platform usage
Design and optimize data pipelines for high-cardinality telemetry data, balancing performance, reliability, and cost
Partner with platform and engineering teams to gather requirements and deliver solutions aligned to operational needs
Provide mentorship through code reviews, documentation, and knowledge sharing
Participate in on-call rotations and operational reviews to drive reliability improvements and post-incident learnings

Requirements:

Must be a US citizen
Ability to obtain US security clearance
Design, implement, and operate components of the Dayforce observability platform in alignment with architectural standards and platform strategy
Lead implementation, tuning, and operational improvements across observability tooling including metrics, logs, traces, dashboards, alerting, and synthetic monitoring
Apply best practices for telemetry collection and instrumentation across application and infrastructure workloads
Build, maintain, and enhance dashboards and alerting mechanisms to support service ownership and incident response
Enable and onboard engineering and infrastructure teams to drive consistent adoption and effective platform usage
Design and optimize data pipelines for high-cardinality telemetry data, balancing performance, reliability, and cost
Partner with platform and engineering teams to gather requirements and deliver solutions aligned to operational needs
Provide mentorship through code reviews, documentation, and knowledge sharing
Participate in on-call rotations and operational reviews to drive reliability improvements and post-incident learnings
Strong communication and collaboration skills across engineering and infrastructure teams
Ability to gather requirements, prioritize effectively, and deliver high-quality solutions within defined scope
Significant experience operating and troubleshooting distributed systems in production environments
Experience implementing and operating observability platforms including metrics, logging, tracing, and alerting systems
Hands-on experience with OpenTelemetry, distributed tracing, and APM tooling
Experience working with data pipelines, ETL processes, and high-cardinality telemetry datasets
Proficiency in at least one object-oriented programming language and one scripting language
Demonstrated ability to deliver scalable, reliable, and maintainable technical solutions
Strong interest in learning and adopting emerging technologies within an established architectural framework
Bachelor's degree plus 5–10 years of related experience, Master's degree plus 6 years of related experience, or equivalent combination of education and experience
Experience operating and tuning observability storage systems such as ClickHouse
Hands-on experience with Kubernetes observability and monitoring containerized workloads
Experience extending or integrating Grafana dashboards, data sources, or plugins
Familiarity applying AI-assisted tooling to observability workflows
Contributions to internal tooling, automation, or documentation that improved observability adoption and usability

Cloud Observability Engineer Lead

Key skills

About this role

Responsibilities:

Requirements: