Crossing Hurdles is seeking a Data Engineer to design, build, and maintain scalable data pipelines. The role involves collaborating with AI researchers and data scientists, developing data models, and ensuring data quality across pipelines.
Responsibilities:
- Design, build, and maintain scalable data pipelines to ingest, process, and transform data from multiple sources
- Collaborate with AI researchers and data scientists to structure and prepare datasets for experimentation and model training
- Develop and maintain data models, schemas, and storage systems optimized for large-scale datasets
- Write efficient SQL queries and Python scripts to extract, transform, and analyze data
- Ensure data quality, integrity, and reliability across data pipelines and storage layers
- Implement data validation, monitoring, and automation workflows that support iterative research cycles
Requirements:
- Have strong proficiency in Python and SQL
- Have experience designing and maintaining ETL / ELT pipelines
- Have solid experience with data manipulation libraries such as Pandas and NumPy
- Have experience working with structured and semi-structured datasets
- Be familiar with relational databases such as PostgreSQL or MySQL