Udio is a company focused on generative audio models, and they are seeking a Senior Backend Engineer to lead the unification of large datasets from various external providers. The role involves building scalable systems for data ingestion, linkage, and quality assessment to support ML research and product development.
Responsibilities:
- Build high-throughput bulk ingestion workflows to integrate datasets from multiple external providers
- Design and implement scalable entity-resolution solutions, including record linking, deduplication, clustering, and conflict arbitration
- Create and refine matching logic, decision rules, and similarity functions to align datasets with high accuracy and strong coverage
- Define and track data quality indicators, such as overlap metrics, match precision/recall, duplicate rates, and completeness
- Prepare training-ready datasets in formats such as TFRecords, and structure data to meet ML research requirements
- Develop processing components using Dataflow (Beam) and manage large analytical workloads in BigQuery
- Leverage frameworks like Ray to accelerate large-scale experiments, feature extraction, and research-oriented data preparation
- Collaborate with ML researchers to anticipate downstream requirements and evolve linkage strategies as new sources and use cases emerge