Design and implement scalable ML pipelines for entity deduplication across knowledge graphs, handling entities extracted from diverse sources (transcripts, documents, structured data).
Build clustering algorithms to identify similar patterns across processes, participants, and execution paths.
Develop ML models to extract canonical patterns of process execution from clickstream data and documentation.
Create ML-powered pipelines that continuously enrich knowledge graphs with discovered patterns, relationships, and insights.
Build models to calculate and analyze conformity scores comparing actual process execution against documented procedures.
Design ML solutions that operate at enterprise scale, handling large volumes of process data and documentation.
Work with Analysis & Evaluation team on pattern discovery from clickstream data and with Process Improvement team on process similarity analysis.
Requirements
7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education.
3+ years of experience with clustering algorithms and unsupervised learning techniques for pattern discovery.
2+ years of experience working with knowledge graphs or graph-based ML techniques.
5+ years of experience with Python with experience in ML frameworks.
Experience with agentic data retrieval and analysis at the Enterprise Level.
3+ years of experience building entity resolution, deduplication, or record linkage systems at scale.
Expertise in NLP techniques for semantic similarity, text clustering, and information extraction.
Experience with graph databases.
Background in process mining, conformance checking, or business process analysis.
Experience with LLMs and embedding models for semantic similarity.
Experience building ML pipelines using MLOps best practices.
Experience with cloud computing platforms.
Experience with distributed computing frameworks.
Knowledge of containerization and orchestration technologies.
Experience in financial services or operations domains.
Excellent communication skills across technical and non-technical audiences.
Advanced degree (M.S. or Ph.D.) in Computer Science, Machine Learning, or related field.
Tech Stack
Cloud
Python
Benefits
Health benefits
401(k) Plan
Paid time off
Disability benefits
Life insurance, critical illness insurance, and accident insurance