Home
Jobs
Saved
Resumes
Site Reliability Engineer – Data Platform at Veepee | JobVerse
JobVerse
Home
Jobs
Recruiters
Companies
Pricing
Blog
Jobs
/
Site Reliability Engineer – Data Platform
Veepee
Website
LinkedIn
Site Reliability Engineer – Data Platform
France
Full Time
2 weeks ago
No Sponsorship
Apply Now
Key skills
BigQuery
Cloud
Grafana
Kafka
Kubernetes
Prometheus
Terraform
EKS
GKE
S3
GitOps
Collaboration
Remote Work
About this role
Role Overview
Ensure the reliability and performance of our data platform services (Trino, Iceberg, S3, Kafka, Flink)
Define and implement SRE best practices: SLIs/SLOs, error budgets, and observability
Build and maintain monitoring, alerting, and incident response frameworks (Prometheus, Grafana, etc.)
Contribute to the migration from a public cloud data warehouse to VeepeeCloud’s lakehouse stack
Support coexistence between cloud and on-prem systems and ensure data consistency and service reliability
Help design resilient architectures for ingestion, transformation, and serving layers
Operate and improve services running on Kubernetes (GKE/EKS and on-prem clusters)
Automate infrastructure provisioning using Terraform, Atlantis, and/or Crossplane
Improve GitOps workflows for platform deployment and configuration
Collaborate with teams to optimize compute and storage usage (Trino queries, BigQuery slots, etc.)
Build tools and dashboards to track cost, usage, and efficiency
Support the transition toward cost-efficient on-prem workloads
Improve self-service capabilities for data teams (e.g., provisioning Trino/Iceberg resources)
Help teams adopt best practices in reliability, observability, and deployment
Write clear technical documentation and runbooks
Contribute to the definition and implementation of the Disaster Recovery Plan (DRP)
Ensure multi-DC resilience (FR1 / NL1) and implement data replication strategies
Participate in incident management and postmortems
Requirements
Strong experience with Kubernetes in production environments
Experience with distributed data systems (or a strong willingness to learn)
Solid understanding of SRE principles (monitoring, alerting, SLAs/SLOs)
Experience with Infrastructure as Code (Terraform or similar tools)
Familiarity with GitOps workflows
Experience with observability tools (Prometheus, Grafana, logging systems)
Comfortable working in cloud environments
Strong collaboration mindset and the ability to work across teams
Fluent in English
Tech Stack
BigQuery
Cloud
Grafana
Kafka
Kubernetes
Prometheus
Terraform
Benefits
Variable bonus
Dynamic and creative environment within international teams
Access to a variety of self-learning courses on our e-learning platform
Opportunity to participate in local and international meetups and conferences
Flexible office policy with up to 3 days remote work per week
Apply Now
Home
Jobs
Saved
Resumes