Akraya, Inc. is an award-winning IT staffing firm recognized for excellence in the workplace. They are seeking a seasoned Data Engineer to build and scale data pipelines for next-generation foundation models, focusing on time-series and large-scale event streams.
Responsibilities:
- Construct and expand distributed data pipelines for extensive time series and log data
- Develop high-performance Spark/Python workflows for creating model training datasets
- Tackle and resolve performance issues related to latency, memory usage, data skew, and throughput
- Elevate data quality, validation, and reproducibility across machine learning workloads
- Engage with machine learning engineers and researchers to foster rapid development of foundation models
Requirements:
- Proficiency in Python
- Comprehensive experience with Apache Spark (PySpark or Scala)
- Proven track record in building large-scale data pipelines in distributed systems
- Prior experience in data engineering for machine learning or large-scale model training workflows is highly desirable
- Familiarity with time series or event-driven data systems will be advantageous