Placer.ai is transforming how organizations understand the physical world through its location analytics platform. They are seeking a Data Platform Engineer to own and scale the Kubernetes infrastructure for their large-scale data processing platform, focusing on making distributed data workloads reliable, cost-efficient, and performant at scale.

Responsibilities:

Operate and scale Kubernetes clusters with thousands of nodes supporting large-scale Spark and data processing workloads
Manage and optimize Apache Spark on Kubernetes — executor autoscaling, driver scheduling, resource tuning, spot instance strategies
Deploy and tune remote shuffle services (e.g., Apache Celeborn) to handle shuffle data at scale across multiple availability zones
Operate and improve self-hosted Apache Airflow infrastructure on Kubernetes
Configure and optimize batch schedulers (e.g., YuniKorn, Volcano) for gang scheduling, fair-share queuing, and resource prioritization
Drive cost optimization across large compute fleets — spot vs. on-demand strategies, node right-sizing, autoscaling policies, local SSD utilization
Support and collaborate with Data Engineering teams on workload performance, resource allocation, and infrastructure requirements
Manage infrastructure-as-code (Terraform) and GitOps deployments (ArgoCD, Helm) for data platform services
Integrate with managed data platforms (e.g., Databricks) and cloud storage for hybrid processing architectures

Requirements:

3+ years of experience operating Kubernetes in production at significant scale (hundreds to thousands of nodes)
Hands-on experience with Apache Spark on Kubernetes — you understand executors, drivers, dynamic allocation, shuffle behavior, and how they map to K8s primitives
Strong understanding of Kubernetes internals — scheduling, resource management, node autoscaling, pod lifecycle, taints/tolerations, local storage
Experience with cloud infrastructure (GCP preferred) — managed Kubernetes, spot/preemptible instances, local SSDs, networking at scale
Comfortable with infrastructure-as-code (Terraform) and GitOps workflows
Proficiency in Python or Go
Experience operating Apache Airflow at scale on Kubernetes
Experience with Apache Celeborn or similar remote shuffle services
Familiarity with YuniKorn or Volcano batch schedulers
Experience with Databricks administration and integration
Knowledge of data formats and storage systems (Parquet, Delta Lake, cloud object storage)
Experience with streaming or messaging systems (Kafka)
Experience with Prometheus/Grafana observability stacks for data platform monitoring
Contributions to open-source data infrastructure projects

Data Platform Engineer

Key skills

About this role

Responsibilities:

Requirements: