About this roleJob Summary We are seeking a highly skilled Iceberg DBA / Lakehouse Operations Engineer to own the reliability, performance, and operational integrity of the Iceberg data layer powering enterprise analytics and business-critical applications. This role operates in a large-scale, multi-engine Lakehouse environment, supporting workloads across Spark, Hive, and Impala, and plays a key role in enterprise data modernization initiatives (Hive and Teradata ? Iceberg). The ideal candidate brings deep expertise in Iceberg table operations, metadata management, and query performance optimization, ensuring consistent, high-performance data access across platforms. This role is critical to ensuring data accuracy and performanceany degradation directly impacts downstream reporting, analytics, and business-critical decision-making. Key Responsibilities Own day-to-day operations of Apache Iceberg tables supporting multiple enterprise applications Ensure data reliability, consistency, and availability across all Lakehouse workloads Maintain operational integrity for datasets at multi-terabyte to petabyte scale Execute advanced Iceberg table maintenance and optimization strategies: Compaction (minor/major) and small file mitigation Snapshot expiration and metadata compaction to control metadata growth Orphan file cleanup (vacuum) to maintain storage efficiency Optimize data layout and performance through: File size tuning and distribution strategies Partition evolution and pruning optimization Clustering and ordering techniques (e.g., Z-ordering or similar patterns) Support and enforce data modeling best practices aligned with: Normalized data structures (3NF) for source-aligned datasets Medallion architecture (Bronze / Silver / Gold layers) for curated data flows Ensure Iceberg table design aligns with: Data ingestion patterns (raw vs curated layers) Downstream consumption and performance requirements Assist in structuring datasets to balance: Data integrity and normalization Query performance and analytical efficiency Work with data engineering teams to ensure consistent implementation of layered data architecture across multiple applications Ensure consistent and performant query behavior across: Spark (CDE) Hive / Impala (CDW) Troubleshoot and resolve: Query performance bottlenecks Metadata inconsistencies across engines Inefficient execution plans and scan patterns Play a key role in enterprise data platform modernization (Hive and Teradata ? Iceberg) Support: Schema alignment and data type mapping Data validation and reconciliation Troubleshoot migration-related issues and ensure post-migration stability and performance Manage Iceberg metadata to ensure: Efficient scaling and performance Consistent table state across engines Execute lifecycle operations: Data retention and archival policies Snapshot lifecycle management and cleanup Time-travel optimization and maintenance Provide L2/L3 support for data-related production issues across Iceberg-based Lakehouse workloads Participate in on-call rotation to support critical data platforms and ensure timely response to incidents Respond to and resolve P1/P2 production incidents within defined SLAs, minimizing impact to downstream applications and reporting Troubleshoot: Data inconsistencies and reporting discrepancies Query failures and performance degradation Perform root cause analysis (RCA) and implement preventive measures to avoid recurring issues Collaborate with platform and application teams during incident triage and resolution Support fine-grained access control using: Ranger policies and RBAC Own and ensure data validation, reconciliation, and accuracy between source and Iceberg datasets Ensure secure and compliant access to data across applications Required Qualifications 10+ years of experience in Big Data / Data Engineering / DBA / Data Operations roles Minimum 2+ years of hands-on experience with Apache Iceberg in production environments 6+ years of experience working with Cloudera ecosystem (CDP Ecosystem) Strong expertise in Iceberg table optimization (compaction, metadata management, partition evolution) Multi-engine performance tuning (Spark, Hive, Impala) Troubleshooting complex data and query performance issues Proven experience handling P1/P2 production incidents Large-scale environments (TB/PB scale) Data migration initiatives (Hive/Teradata ? Iceberg) Lead enforcement of data modeling and Lakehouse standards across applications Guide teams on Medallion architecture implementation and balancing normalization vs performance Review and resolve complex data modeling and performance trade-offs Ensure consistency of data structures across domains and workloads Mentor and guide L2 resources in operational best practices and troubleshooting Strong hands-on experience with Apache Iceberg and/or Hive-based data lakes Understanding of data modeling concepts (normal forms) and modern Lakehouse patterns (Medallion architecture) Expertise in table-level optimization and performance tuning, and large-scale data management (TB/PB scale) Experience with: Spark SQL, Hive, Impala, NiFI, Trino Strong understanding of partitioning strategies, file formats (Parquet/ORC), and distributed query processing Preferred Qualifications Experience with Hive-to-Iceberg or Teradata-to-Iceberg migration Cloudera CDP (CDE/CDW) Familiarity with: Cloud platforms (AWS, Azure) Scripting/automation (Python, Shell) Education: Bachelors Degree