Design and implement large-scale OpenSearch clusters capable of handling petabytes of data with low-latency indexing and query performance.
Deeply integrate OpenSearch with CDP components (e.g., Apache Iceberg, SDX, and Ozone) to provide a unified search experience across the data lakehouse.
Optimize JVM settings, shard allocation strategies, and query DSL to ensure maximum throughput and stability.
Implement enterprise-grade security including RBAC, TLS, and audit logging, ensuring compliance with Cloudera’s Shared Data Experience (SDX) standards.
Develop and maintain Kubernetes Operators and Helm charts for automated deployment, scaling, and self-healing of search services.
Act as a liaison to the upstream OpenSearch community, contributing bug fixes, features, and performance improvements.
Requirements
5+ years of experience working with OpenSearch or Elasticsearch in a production environment at scale.
Strong understanding of distributed system concepts (Consensus algorithms, CAP theorem, replication, and sharding).
Proficiency in Java (core OpenSearch development) and/or Go/Python for automation and tooling.
Extensive experience with Kubernetes (K8s) and container orchestration.
Hands-on experience deploying search workloads on AWS (EKS/AOSS), Azure (AKS), or Google Cloud (GKE).
Familiarity with the Hadoop ecosystem or modern equivalents like Spark, Flink, and Hive is a major plus.
Experience with Lucene internals (segment merging, bitsets, and codecs) is preferred.
Knowledge of Vector Database capabilities within OpenSearch for Generative AI (RAG) use cases is preferred.
History of contributing to open-source projects (Apache Software Foundation or OpenSearch Project) is preferred.