Architect and implement entity resolution logic to de-duplicate and link disparate data points into unified "Golden Records" for businesses and individuals
Design and maintain a high-performance global business knowledge graph and ontology to map complex ownership chains, UBOs, and hidden risk relationships across international borders
Implement a hybrid storage strategy that bridges graph databases for relationship mapping with document and search stores for rich metadata and adverse media content
Optimize the platform for real-time risk assessment, ensuring the ability to traverse multiple levels of ownership in milliseconds to support automated "Go/No-Go" onboarding decisions
Design and build scalable data services and APIs for ingesting, transforming, and serving data across the company
Develop and maintain batch and streaming data pipelines using modern data processing frameworks and AWS cloud-native tooling
Own the reliability, performance, and API first data platform, including monitoring, alerting, and on-call where appropriate
Implement best practices for data modeling, quality, lineage, and governance to ensure trustworthy, well-documented datasets
Work closely with data scientists, analysts, and application engineers to understand their needs and translate them into robust platform capabilities
Drive automation and standardization through CI/CD, model as a service, and reproducible environments
Help define and evolve the architecture of our data platform as a true internal service with clear contracts, SLAs, and versioned APIs
Requirements
Expertise in Graph Ecosystems: Hands-on experience with Graph databases (e.g., Neo4j, AWS Neptune, or TigerGraph) and query languages like Cypher or Gremlin
Identity & Linkage Mastery: Proven experience with Entity Resolution or Record Linkage (e.g., using tools like Senzing, Quantexa, or custom probabilistic matching models)
Schema Design: Ability to design flexible ontologies that handle evolving regulatory data (e.g., changing PEP definitions or Sanction list formats)
API Performance for Graphs: Experience building GraphQL or REST APIs specifically optimized for graph traversals and deep-tree lookups
Experience building centralized data platforms or “data-as-a-service” offerings at scale (e.g., at a large tech or cloud-native company)
Strong software engineering skills in at least one language commonly used for data and services (e.g., Python, Java, Go, Rust)
Hands-on experience building data pipelines and ETL/ELT workflows on a major cloud provider (AWS preferred)
Experience with modern data stack tools such as Spark/Flink, Kafka/Kinesis, Airflow/managed schedulers, and data warehouses (e.g., Snowflake, Redshift, BigQuery, Databricks)
Familiarity with DevOps practices: CI/CD, containerization (Docker), orchestration (Kubernetes), and infrastructure-as-code (Terraform)
Strong focus on observability (metrics, logs, traces), resilience, and building early warning signals
Comfort collaborating cross-functionally and communicating clearly with both technical and non-technical stakeholders.