Role Overview

Drive the end to end migration of a legacy, rules-based text processing platform to a modern, Python based natural language processing (NLP) architecture, improving scalability, maintainability, and long term extensibility.
Modernize legacy extraction logic.
Analyze existing grammar, pattern rules, and linguistic logic from legacy systems and refactor them into contemporary NLP pipelines using industry standard libraries and frameworks.
Design and implement context aware extraction.
Build, test, and maintain high precision context extraction logic to identify entities, attributes, events, and relationships that support new product development and advanced analytics use cases
Apply ontologies and taxonomies to text understanding.
Develop, extend, and apply domain ontologies and taxonomies to standardize language interpretation, support semantic consistency, and improve the accuracy and explainability of extraction results.

Requirements

Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, or related field (or equivalent experience) is required.
7+ years of experience in data analysis, transformation, and development, with a strong specialization in Natural Language Processing in the insurance industry.
5+ years of strong proficiency in SQL and Python
Strong background in natural language processing and text analytics, with hands on experience building rule-based and hybrid (rules + ML/AI) extraction solutions.
Applied experience designing and using ontologies and taxonomies to model domain concepts, normalize language, and support semantic interpretation of unstructured text.
Deep expertise in context-aware information extraction, including pattern matching, dependency-based rules, negation handling, section awareness, and linguistic feature engineering.
Proficiency in Python software development, including building production ready libraries, writing maintainable code, and debugging complex text processing pipelines.
Experience working with large volumes of unstructured or semi-structured text, such as documents, notes, forms, or free text fields.
Ability to align extraction outputs to canonical domain models, controlled vocabularies, and taxonomy structures.
Familiarity with extraction evaluation and tuning, including precision/recall measurement, error analysis, and iterative rule optimization.
Understanding of data integration and pipeline concepts, including how extracted features are consumed by downstream analytics, ML models, or enterprise applications.
Ability to translate evolving business and product requirements into precise, testable extraction logic.
Experience supporting new product development, where requirements are iterative and domain models, terminology, and extraction strategies evolve over time.
Strong communication skills to explain NLP behavior, limitations, and tradeoffs to both technical and non-technical stakeholders.
Candidate must be authorized to work in the US without company sponsorship.

Tech Stack

Python
SQL

Benefits

Short-term or annual bonuses
Long-term incentives
On-the-spot recognition

Senior Knowledge Engineer, Text Factory

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits