Proactively work with engineers to optimize schema and query patterns.
Improve observability stack on DataDog.
Set up alerting on query performance, merges, replicas, and disk usage.
Own backup and restore processes (to secondary cluster and S3).
Plan and execute upgrades with zero or minimal downtime.
Test and document disaster recovery scenarios.
Own security within ClickHouse, including access control, encryption where applicable, and adherence to internal and regulatory data protection requirements.
Resolve production issues and figure ways to prevent them.
Participate in the Architecture Review process to ensure major changes to the product portfolio have data flow and ClickHouse considerations represented at the very start.
Requirements
5+ years hands-on ClickHouse experience and 8+ years working as a DBA with distributed OLAP databases
Have dealt with data volume at scale (i.e. 100s of TBs) & understand the challenges this brings
Strong systems background with deep understanding of IO performance and network protocols, specifically for high-volume high-throughput analytical workloads (reads and writes!)
Proficiency in writing and optimizing complex SQL queries
Experience with Linux server administration and storage (i.e. RAID)
Experience with monitoring/alerting tools is a strong plus, particularly for database metrics and observability
Familiarity with data infrastructure, reporting systems, or analytics-heavy SaaS products