Grafana Labs is a remote-first, open-source powerhouse with over 20 million users globally. They are seeking a Staff Software Engineer to enhance their Cloud Observability platform, focusing on metrics, logs, and traces integration, while collaborating with teams to improve infrastructure monitoring capabilities.

Responsibilities:

Design and implement high-quality, scalable integrations for various infrastructure components, applications, and data ingestion pipelines
Create middleware components and libraries that simplify development and maintenance of observability solutions
When necessary, represent Grafana Labs in open source forums, working groups, and events
Work with product teams, in addition to design and docs, to develop features that align with wider product strategy and customer needs
Lead the technical direction and vision of the team, contributing to strategic discussions and future development of observability solutions
Work with other departments including Sales, Product, and Support teams to deliver a holistic product experience
Take ownership of the services you’re running by deploying well tested clean code
Embrace our open-source culture and contribute to other projects that may not directly fall within your team’s scope

Requirements:

Strong 8+ years of experience with at least one programming language - any major language (Python, .NET, Java, Go, Rust, etc) is acceptable
Demonstrated working experience in operating high-scale production systems running on Kubernetes and monitoring it, including on-call participation, incident response, and postmortem practices
Familiarity with observability tooling (e.g. Grafana)
Strong understanding of time-series data, metrics cardinality challenges, and cost/performance tradeoffs/optimizations in observability systems
Experience in a hands-on technical leadership role - setting technical direction, leading project teams, and influencing architectural decisions beyond your immediate team
Deep understanding of distributed systems concepts including scalability, consistency, high availability, and failure modes in large-scale systems
Experience writing clean, maintainable, robust, and performant software
Experience with delivering projects from start to finish in a self-driven manner
Excellent problem-solving and debugging skills
Strong mentoring and leadership skills
Experience operating or scaling Prometheus in high-cardinality, multi-tenant environments
Experience working with OpenTelemetry Collector pipelines or similar telemetry ingestion systems
Certified Kubernetes Administrator (CKA)/ Certified Kubernetes Application Developer (CKAD) or any other Kubernetes related certification from CNCF
Experience developing Kubernetes operators, controllers, or custom resources
Strong understanding of metrics collection, visualization, and alerting concepts
Experience contributing to or maintaining open source projects, with evidence of successful pull requests and community collaboration
Experience designing and building observability backends for various systems and applications

Staff Software Engineer - Grafana Cloud Observability, Kubernetes Monitoring | USA - EST only | Remote

Key skills

About this role

Responsibilities:

Requirements: