ClickHouse is a recognized innovative cloud company focused on real-time analytics and data warehousing. They are seeking a Database Reliability Engineer to enhance the reliability, availability, and performance of ClickHouse, working closely with various teams to implement best practices and manage incident responses.
Responsibilities:
- Continuously improve the reliability and performance of ClickHouse core
- Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers
- Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements
- Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers
- Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities
- Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact
Requirements:
- Bachelor's or Master's degree in Computer Science or a related field
- At least 5 years of experience in Reliability Engineering, QA or customer facing engineering
- Previous experience operating ClickHouse or other SQL databases in production
- Excellent understanding of distributed database internals and SQL, particularly ClickHouse is a major plus
- Scripting experience with Shell or Python, and ability to read and understand C++ code
- Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform
- You are a strong problem-solver and have solid production debugging skills
- You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward
- You have a high level of responsibility, ownership, and accountability
- Excellent communication skills