TRM Labs is a company that provides blockchain analytics and AI solutions to enhance security for law enforcement and financial institutions. They are seeking a Senior Data Engineer to design and implement scalable data lakehouse architecture, focusing on data modeling, ingestion, and query optimization. The role involves collaboration across departments to support complex analytical workloads and ensure data governance.
Responsibilities:
- Architect and scale a high-performance data lakehouse on GCP, leveraging technologies like StarRocks, Apache Iceberg, GCS, BigQuery, Dataproc, and Kafka
- Design, build, and optimize distributed query engines such as Trino, Spark, or Snowflake to support complex analytical workloads
- Implement metadata management in open table formats like Iceberg and data discovery frameworks for governance and observability using Iceberg compatible catalogs
- Develop and orchestrate robust ETL/ELT pipelines using Apache Airflow, Spark, and GCP-native tools (e.g., Dataflow, Composer)
- Collaborate across departments, partnering with data scientists, backend engineers, and product managers to design and implement
Requirements:
- 5+ years of experience in data or software engineering, with a focus on distributed data systems and cloud-native architectures
- Proven experience building and scaling data platforms on GCP, including storage, compute, orchestration, and monitoring
- Strong command of one or more query engines such as Trino, Presto, Spark, or Snowflake
- Experience with modern table formats like Apache Hudi, Iceberg, or Delta Lake
- Exceptional programming skills in Python, as well as adeptness in SQL or SparkSQL
- Hands-on experience orchestrating workflows with Airflow and building streaming/batch pipelines using GCP-native services