Cloudera is a leader in data management and analytics, empowering organizations to transform complex data into actionable insights. As a Senior Engineer on the Cloudera Context Search Team, you will design and implement high-performance search infrastructure, bridging big data storage with real-time retrieval to enhance data discovery and analytics for large enterprises.
Responsibilities:
- Architect & Scale: Design and implement large-scale OpenSearch clusters capable of handling petabytes of data with low-latency indexing and query performance
- Platform Integration: Deeply integrate OpenSearch with CDP components (e.g., Apache Iceberg, SDX, and Ozone) to provide a unified search experience across the data lakehouse
- Performance Tuning: Optimize JVM settings, shard allocation strategies, and query DSL to ensure maximum throughput and stability
- Security & Governance: Implement enterprise-grade security including RBAC, TLS, and audit logging, ensuring compliance with Cloudera’s Shared Data Experience (SDX) standards
- Cloud Native Operations: Develop and maintain Kubernetes Operators and Helm charts for automated deployment, scaling, and self-healing of search services
- Community Contribution: Act as a liaison to the upstream OpenSearch community, contributing bug fixes, features, and performance improvements
Requirements:
- Bachelor's degree in Computer Science or equivalent and 5-6 years of related experience; OR Master's degree and 3-5 years of related experience; OR PhD and 0-3 years of related experience
- 5+ years of experience working with OpenSearch or Elasticsearch in a production environment at scale
- Strong understanding of distributed system concepts (Consensus algorithms, CAP theorem, replication, and sharding)
- Proficiency in Java (core OpenSearch development) and/or Go/Python for automation and tooling
- Extensive experience with Kubernetes (K8s) and container orchestration
- Hands-on experience deploying search workloads on AWS (EKS/AOSS), Azure (AKS), or Google Cloud (GKE)
- Familiarity with the Hadoop ecosystem or modern equivalents like Spark, Flink, and Hive
- Experience with Lucene internals (segment merging, bitsets, and codecs)
- Knowledge of Vector Database capabilities within OpenSearch for Generative AI (RAG) use cases
- History of contributing to open-source projects (Apache Software Foundation or OpenSearch Project)