OKX is a leading crypto exchange that focuses on reshaping the future through crypto. The Sr Data Engineer will design and maintain scalable data pipelines for compliance analytics, ensuring data quality and reliability to support compliance functions.
Responsibilities:
- Design, build, and maintain scalable data pipelines that ingest, transform, and deliver data for compliance analytics use cases including AML transaction monitoring, KYC/KYB, sanctions screening, and SAR reporting
- Assess and improve existing data infrastructure: identify pipeline gaps, lineage issues, and data quality problems that silently degrade model and analytics output, and work through them systematically
- Build and maintain compliance-specific data marts and semantic layers that allow data scientists and analysts to work independently, reducing the volume of ad-hoc data requests and increasing the team's overall throughput
- Partner with data scientists and ML engineers to productionise feature pipelines, maintain data freshness, and build the infrastructure that keeps compliance ML models running reliably in production
- Apply AI-assisted development as standard practice: using LLM tooling to write and review pipeline code, automate data quality checks, generate documentation, and accelerate debugging. The expectation is that you bring this fluency with you and use it to raise the quality and pace of your work
- Implement data quality monitoring, pipeline health checks, and alerting that surfaces data integrity issues before they affect compliance decisions or model outputs
- Work with compliance and legal teams to understand regulatory requirements around data retention, access control, and auditability, and build the controls that meet those requirements in practice
- Support regulatory lookbacks and audit responses by ensuring historical data is retrievable, lineage is documented, and the evidence base the compliance team needs can be assembled accurately and quickly
Requirements:
- 8+ years in data engineering or a closely related role, with meaningful experience in financial services, fintech, or a compliance-adjacent environment
- Strong Python and SQL, with hands-on experience in distributed computing frameworks such as Spark, Hadoop, or Databricks
- Solid experience with cloud and big data platforms including Alibaba MaxCompute, Google BigQuery, AWS Redshift, or equivalent, and a track record of building production-grade pipelines where reliability and data quality are treated as requirements rather than afterthoughts
- Hands-on fluency with AI-assisted engineering. You use LLM coding tools regularly, have applied them to real data engineering work, and have a practical view on where they improve output quality and where they need careful oversight
- Experience designing data marts, dimensional models, or semantic layers for consumption by non-engineering stakeholders, with an understanding of what makes a self-serve analytics layer actually usable
- A solid grounding in data governance, access control, and audit trail requirements in regulated industries, with an appreciation for why those requirements exist and how to implement them without creating unnecessary friction
- Good communication skills, with the ability to translate data infrastructure decisions and constraints for compliance, legal, and business stakeholders who care about outcomes rather than technical architecture
- Familiarity with the crypto ecosystem, on-chain data structures, blockchain analytics, or VASP regulatory frameworks is a meaningful advantage for this role and will give you useful context from day one