Participate in the design and implementation of distributed task orchestration systems using Temporal or Celery.
Architect pipelines across cloud object storage (S3, GCS), data lakes, and metadata catalogs.
Implement partitioning, sharding, and caching strategies to ensure data processing pipelines are resilient, highly available, and consistent.
Design, implement, and maintain distributed ingestion pipelines for structured and unstructured data (images, 3D/2D assets, binaries).
Build scalable ETL/ELT workflows to transform, validate, and enrich datasets for AI/ML model training and analytics.
Support preprocessing of unstructured assets (e.g., images, 3D/2D models, video) for training pipelines, including format conversion, normalization, augmentation, and metadata extraction.
Implement validation and quality checks to ensure datasets meet ML training requirements.
Collaborate with ML researchers to quickly adapt pipelines to evolving pretraining and evaluation needs.
Use infrastructure-as-code (Terraform, Kubernetes, etc.) to manage scalable and reproducible environments.
Manage data assets using Databricks Asset Bundles (DABs) and build rigorous CI/CD pipelines (GitHub Actions).
Focus on maximizing cluster utilization (CPU/Memory) and optimizing EC2 instance allocation to aggressively reduce compute costs.
Take ownership of the platform’s "Interface" by building Data Explorers and management consoles using React or Next.js.
Actively listen to researchers and data scientists to iterate on UI/UX based on their feedback.
Simplify complex CLI operations into intuitive GUI interactions to boost overall developer experience (DevEx).
Requirements
2+ years of experience in software engineering, backend development, or distributed systems.
Strong programming skills in Python (plus Scala/Java/C++ a plus).
Familiarity with distributed frameworks (Spark, Dask, Ray) and cloud platforms (AWS/GCP/Azure).
Experience with workflow orchestration tools (Temporal, Celery, or Airflow).
Proficiency with Infrastructure as Code (Terraform) and CI/CD tools (GitHub Actions).
Experience building web applications or internal tools using React or Next.js.
A "product-first" mindset: an interest in how users interact with infrastructure and a desire to build clean, functional interfaces.