Data Engineer (Google Cloud Platform) (Contract) Multiple Openings
Type: Contract (6 months, likely extension, long term)
Location: Remote (PST/EST overlap required)
Technical Requirements
Must-Have
- PySpark & Databricks: Strong hands-on experience building and maintaining production pipelines. Experience with Unity Catalog is a plus.
- Python Engineering: Primary development language with production-grade practices (typed, tested, modular code not notebook-only development).
- Google Cloud Platform Ecosystem: Proven experience with BigQuery, Dataproc, Airflow/Cloud Composer, Pub/Sub, and Cloud Storage (Parquet).
- Data Ingestion: Experience working with complex, multi-source datasets across varying schemas and formats (e.g., JSON, CSV, XML, custom feeds).
- Data Modeling: Strong understanding of staging and curated layer design, partitioning strategies, and schema evolution across distributed data sources.
- Experience Level: 4 7+ years of hands-on data engineering experience with a track record of building and maintaining production systems (not research-focused profiles).
Nice to Have
- dbt: Experience working with dbt, ideally within a medallion/lakehouse architecture.
- Entity Resolution: Familiarity with record linkage techniques (fuzzy matching, phonetic similarity, Python-based frameworks).
- Domain Experience: Exposure to music, royalties, or media rights data is a plus.