DockerLinuxPySparkPythonSparkSQLRNLPLLMRAGJupyterGitConfluenceCommunicationCollaborationRemote Work
About this role
Role Overview
Lead data exploration and analysis on large scale financial crime datasets — including sanctions, PEP (Politically Exposed Persons), and adverse media data — to uncover patterns, identify false positives/negatives, and drive feature improvements.
Develop and evaluate agents and rule-based models by running experiments, validating hypotheses, and fine-tuning thresholds to improve alert efficiency.
Build and deliver production-ready API integrations — coordinating with software engineers and product teams to ensure components are properly integrated, tested, and merged.
Conduct customer-focused data studies across multiple enterprise clients (e.g., financial institutions) to benchmark model performance, assess data quality, and propose data driven solutions to reduce investigation loads.
Prototype and iterate quickly — using PySpark, Jupyter notebooks, and Python to explore data, build reproducible pipelines, and generate insights that inform product decisions.
Investigate and resolve product issues in collaboration with engineering and product teams.
Contribute to R&D on emerging techniques — including graph-based approaches (GNNs, graph embeddings) for transaction monitoring, LLM-based feature exploration, and RAG-based models.
Communicate findings clearly through well-organized Jupyter notebooks, internal documentation, and stakeholder presentations, translating complex analytical results into actionable business insights.
Requirements
Bachelor's or Master's degree in Data Science, Computer Science, Statistics, or a related field
Minimal 3 years of hands-on experience delivering data science projects, ideally in financial crime compliance, name screening, or AML/KYC domains
Strong proficiency in Python (data manipulation, modelling, pipeline development) and SQL / Spark SQL for large-scale data querying and transformation
Hands-on experience with PySpark or similar distributed data platforms
Familiarity with NLP techniques, and entity resolution concepts
Experience working with LLMs or RAG-based models for information extraction or classification tasks is an advantage
Solid understanding of data quality assessment, including profiling, anomaly identification, and merging logic across complex multi-source datasets
Comfortable working in Git, Docker, Linux, and collaborative development workflows (including code reviews and pull requests)
Strong analytical and problem-solving skills — able to investigate ambiguous data issues, form hypotheses, and validate findings rigorously
Good communication skills — able to document findings in a structured and reproducible manner (Jupyter notebooks, Confluence), and present results clearly to both technical and non-technical stakeholders
A mindset of ownership and curiosity: you take initiative, ask the right questions, and follow through to delivery.