Grafana Labs is a remote-first, open-source powerhouse with over 20M users of its visualization tool. The role focuses on providing automation for provisioning cloud resources and enhancing developer productivity within a collaborative environment.
Responsibilities:
- We are hiring for the Platform InfraCore squad. This squad provides and owns automation of the provisioning of CSP resources, including networking, Kubernetes clusters and specific CSP resources required by our application teams
- We’re part of a Platform Engineering group that manages infrastructure for the teams that are building some of the most cherished Platform tools - Grafana, Mimir, Loki, Tempo, Pyroscope to name a few
- We invest heavily in developer productivity. You can use modern AI coding assistants as part of your daily workflow (your choice of tools, within security guidelines), backed by a company-funded usage budget so you can iterate quickly without unnecessary friction
- We encourage pragmatic AI-assisted development: faster prototyping, test generation, refactors, documentation, and incident follow-ups—always paired with strong code review and quality standards
- You’ll also have access to frontier models (e.g., GPT-Codex 5/3, Claude Opus 4.6, Gemini 3 Pro)
- Kubernetes cluster provisioning and lifecycle management
- Management of cluster networking components: load balancing, NAT, DNS, CNIs, network policies, private connectivity for customers, cross-cluster communication
- Management of scheduling and autoscaling
- Maintaining Crossplane compositions and Terraform modules for CSP resources common to our users. As well the management of versioning and compatibility for Crossplane and Terraform core as well as providers
- Work with our users (Grafana Cloud application teams) to help understand their needs and ensure we’re investing in the right capabilities
- Participation in the Platform department Infrastructure wing on-call rotation
Requirements:
- Kubernetes cluster provisioning and lifecycle management
- Management of cluster networking components: load balancing, NAT, DNS, CNIs, network policies, private connectivity for customers, cross-cluster communication
- Management of scheduling and autoscaling
- Maintaining Crossplane compositions and Terraform modules for CSP resources common to our users. As well the management of versioning and compatibility for Crossplane and Terraform core as well as providers
- Work with our users (Grafana Cloud application teams) to help understand their needs and ensure we're investing in the right capabilities
- Participation in the Platform department Infrastructure wing on-call rotation
- You've worked in or on open source, or other community-based projects previously. At Grafana Labs, 'OSS is in our DNA'
- Experience with a few CSPs. We run Grafana Cloud on AWS, GCP, and Azure using each's managed Kubernetes service - EKS, GKE, AKS
- Experience operating and managing workloads on Kubernetes. We use Tanka for configuration management with Jsonnet
- Familiarity with Kubernetes scheduling and projects like Karpenter
- Terraform and/or Crossplane experience. We have mixed usage - each has its strengths
- Enjoys programming in Go. We love building our own tools, utilities, exporters, etc. that suit our needs and otherwise don't exist (and open sourcing them)
- Likes to think about operational maturity. Find anything particularly interesting about the CNCF Platform Engineering Maturity Model?