TRM Labs is a company that provides blockchain analytics and AI solutions to help various sectors combat financial crime. They are seeking a Senior Data Engineer to design, implement, and scale their data lakehouse architecture, focusing on data modeling, ingestion, and performance optimization.
Responsibilities:
- Architect and scale a high-performance data lakehouse on GCP, leveraging technologies like StarRocks, Apache Iceberg, GCS, BigQuery, Dataproc, and Kafka
- Design, build, and optimize distributed query engines such as Trino, Spark, or Snowflake to support complex analytical workloads
- Implement metadata management in open table formats like Iceberg and data discovery frameworks for governance and observability using Iceberg compatible catalogs
- Develop and orchestrate robust ETL/ELT pipelines using Apache Airflow, Spark, and GCP-native tools (e.g., Dataflow, Composer)
- Collaborate across departments, partnering with data scientists, backend engineers, and product managers to design and implement
Requirements:
- 5+ years of experience in data or software engineering, with a focus on distributed data systems and cloud-native architectures
- Proven experience building and scaling data platforms on GCP, including storage, compute, orchestration, and monitoring
- Strong command of one or more query engines such as Trino, Presto, Spark, or Snowflake
- Experience with modern table formats like Apache Hudi, Iceberg, or Delta Lake
- Exceptional programming skills in Python, as well as adeptness in SQL or SparkSQL
- Hands-on experience orchestrating workflows with Airflow and building streaming/batch pipelines using GCP-native services