Arena Club is pioneering the collectibles domain by introducing the first-ever digital card show. They are seeking a Senior Data Engineer to strengthen strategic decision-making, enhance operational performance, and integrate data across the company to unlock deeper insights into customer behavior and market performance.
Responsibilities:
- Maintain and optimize inbound and outbound ETL pipelines built on AWS Glue (Python Shell & Spark ETL)
- Manage Redshift cluster performance across various schemas
- Own integrations with SaaS data sources via AppFlow and direct connectors
- Operate outbound distribution pipelines to external vendors
- Manage infrastructure, alerting, and migration state tracking
- Lead the migration from ad-hoc SQL scripts to a Bronze/Silver/Gold medallion architecture with dbt as the transformation layer
- Design and implement dimensional models ie fact tables and dimensions
- Build the Silver staging layer
- Architect the real-time CDC pipeline
- Implement data contracts and governance at the Silver layer to insulate downstream consumers from source changes
- Implement a hot/cold storage strategy via Redshift Spectrum
- Build the Unified Access Layer
- Design and automate Glue jobs
- Configure S3 lifecycle policies for progressive cost reduction
Requirements:
- 5+ years in data engineering with production pipeline ownership (not just analytics or BI)
- Deep AWS experience: Glue (both Python Shell and Spark ETL), Redshift, S3, IAM, EventBridge, Lambda, AppFlow
- Strong SQL: complex joins, window functions, MERGE/UPSERT patterns, Redshift-specific optimization (sort keys, dist keys, VACUUM/ANALYZE)
- Python fluency: boto3, data processing libraries, writing production Glue scripts (not just notebooks)
- Dimensional modeling: star schemas, fact/dimension design, SCD Type 1 and Type 2 implementation
- dbt: hands-on experience building and maintaining staging, intermediate, and mart models with tests and documentation
- Data warehouse operations: schema migration, incremental loads, backfill strategies, monitoring, and alerting
- Redshift Spectrum: experience with external schemas, Parquet/Hive partitioning, and unified hot/cold querying
- CDC / streaming: Postgres WAL, Debezium, EventBridge, or similar change data capture pipelines
- Data Mesh concepts: domain-oriented ownership, data-as-a-product thinking, federated governance
- AppFlow & SaaS integrations: configuring and troubleshooting managed connectors for Stripe, Zendesk, Mixpanel, etc
- Cost optimization: right-sizing Glue jobs (Python Shell vs. Spark), Redshift concurrency scaling, S3 lifecycle policies
- Vendor distribution: building outbound API sync jobs with rate limiting, SFTP transfers, webhook delivery
- Familiarity with marketplace or e-commerce data (orders, payments, attribution, promo codes)
- Experience with Mixpanel, Customer.io, or Singular data exports and event schemas
- Prior experience migrating from monolithic ETL to medallion or lakehouse architectures
- Exposure to data governance tooling: data catalogs, lineage tracking, quality frameworks (e.g., Great Expectations, dbt tests)