Maintain, support, and enhance enterprise search applications and indexing pipelines running on Solr, Spark, Flume, AWS OpenSearch, and Linux-based infrastructure.
Perform operations and maintenance activities including patching, vulnerability management, system administration, monitoring, configuration management, and routine platform support.
Troubleshoot and resolve indexing failures, data issues, query syntax problems, relevance issues, and search performance degradation.
Monitor indexing jobs and platform health to ensure indexing SLAs and search/retrieval SLAs are consistently met.
Fine-tune search configurations, analyzers, synonyms, ranking logic, and index structures to optimize search relevance, indexing throughput, and retrieval performance.
Perform scheduled maintenance activities including weekly, monthly, and biannual data purges, index cleanup, retention-related tasks, and storage optimization.
Support quarterly bulk indexing operations, including large-scale reloads, synonym refreshes, name normalization, and post-load validation.
Work with source data stored in Microsoft SQL Server to investigate ingestion issues, validate upstream data quality, analyze indexing results, and review operational or search-related metrics captured after indexing.
Use SQL to query source and operational data, investigate discrepancies, support troubleshooting, and validate indexing outcomes and SLA-related metrics.
Use Azure Log Analytics and related monitoring tools to analyze logs, investigate failures, identify operational trends, and support root cause analysis for indexing, search, and data-processing issues.
Maintain awareness of Java-based source-to-target data comparison and validation tools, review their findings regularly, and address data quality, synchronization, and accuracy issues between source systems and indexed search platforms.
Investigate and resolve data ingestion, transformation, indexing, and retrieval issues across upstream and downstream systems.
Build and improve automation, scripts, and support tooling that increases reliability, reduce operational overhead, and improve observability.
Support modernization of legacy search capabilities toward AWS OpenSearch, semantic search, vector search, hybrid retrieval, and RAG-enabled solutions.
Document procedures, runbooks, support processes, recurring maintenance activities, and operational findings, and collaborate with cross-functional teams to improve search quality and system stability.
Requirements
Bachelor’s degree in Computer Science, Engineering, Information Systems, or related field, or equivalent practical experience.
5+ years of experience in software engineering, enterprise search, search platform support, information retrieval, or large-scale data systems.
Hands-on experience supporting and enhancing enterprise search platforms such as Apache Solr, OpenSearch, Elasticsearch, Lucene, or similar technologies.
Experience working in Linux production environments, including troubleshooting, configuration updates, and routine system support.
Strong experience with indexing, schema design, mappings, analyzers, query processing, search relevance tuning, and performance optimization.
Experience supporting batch or large-scale data ingestion and processing pipelines using Spark and related technologies.
Strong programming skills in Java and/or Python.
Working knowledge of Microsoft SQL Server, including the ability to query source data, validate data flows, investigate discrepancies, and analyze indexing or operational metrics.
Experience using log monitoring and observability tools to investigate failures and support operational troubleshooting; exposure to Azure Log Analytics strongly preferred.
Experience troubleshooting distributed systems, indexing pipelines, data synchronization issues, and production search clusters.
Experience supporting SLA-driven operational environments with accountability for service restoration, recurring maintenance, and issue resolution.
Ability to understand and work with validation, comparison, and support tools used to verify data accuracy between source and target systems.
Strong analytical, troubleshooting, and problem-solving skills.
Strong written and verbal communication skills, including the ability to work across engineering, operations, and stakeholder teams.
Work in US eastern time zone, and available to travel to Washington, DC area once a year.