Cloudera is a leading data partner for top companies, transforming complex data into actionable insights. The Senior Data Engineer role involves designing and executing test plans for validating data pipelines, ensuring quality and governance in a hybrid cloud environment.
Responsibilities:
- End-to-End Data Pipeline Validation: Design and execute test plans validating the end-to-end cluster creation flow on a kubernetes platform
- Data Modeling & Proactive Data Quality: Managing complex data modeling and schema drift, as well as embedding automated data quality checks and statistical anomaly detection directly into pipelines to shift away from reactive, manual quality processes
- Unified Data Governance Integration: Working with governance layers to ensure policies like tag-driven Attribute-Based Access Control (ABAC), column-level masking, row-level filters, and zero-code lineage ingestion (e.g., Octopai) are accurately enforced at the data layer
Requirements:
- AI First Mindset: Ability to learn and develop AI enabled test automation frameworks
- Engine SME Expertise: Hands-on understanding of modern compute and streaming engine internals like Spark, Kafka, Trino, Airflow
- Kubernetes Expertise: Understanding of Kubernetes internals (CRDs, Controllers, Operators, Namespaces). You must understand how to debug and test complex Helm chart deployments and dependencies
- Language Proficiency: Expert-level proficiency in Python/Shell for scripting and automation
- Education: Bachelor's or Master's degree in Computer Science or equivalent experience
- Experience: 8+ years of software engineering experience with a focus on test automation, infrastructure, or backend development