Cloudera is a leading data partner for top companies, empowering people to transform complex data into actionable insights. The Senior Data Engineer will leverage expertise in data ecosystem engines to validate use cases and ensure the functioning of data pipelines.
Responsibilities:
- Data Modeling & Proactive Data Quality: Managing complex data modeling and schema drift, as well as embedding automated data quality checks and statistical anomaly detection directly into pipelines to shift away from reactive, manual quality processes
- Unified Data Governance Integration: Working with governance layers to ensure policies like tag-driven Attribute-Based Access Control (ABAC), column-level masking, row-level filters, and zero-code lineage ingestion (e.g., Octopai) are accurately enforced at the data layer
Requirements:
- Hands-on understanding of modern compute and streaming engine internals like Spark, Kafka, Trino, Airflow
- Understanding of Kubernetes internals (CRDs, Controllers, Operators, Namespaces). You must understand how to debug and test complex Helm chart deployments and dependencies
- Expert-level proficiency in Python/Shell for scripting and automation
- Bachelor's or Master's degree in Computer Science or equivalent experience
- 8+ years of software engineering experience with a focus on test automation, infrastructure, or backend development