DeepRec.ai is partnering with an AI focused HealthTech company centered around early-stage cancer detection. This remote Data Engineering role focuses on building and maintaining scalable pipelines for large healthcare datasets, ensuring data quality and compliance with healthcare regulations.
Responsibilities:
- Work with Data Scientists and ML Engineers to define data needs for LLM and ML models
- Build and maintain scalable data pipelines for large healthcare datasets
- Ensure data quality through cleaning, validation, and monitoring
- Design efficient data structures and schemas for model training and use
- Source new data while ensuring compliance with healthcare regulations (e.g., HIPAA)
Requirements:
- Bachelor's degree in Computer Science, Engineering, or a related field
- Experience as a Data Engineer working with large-scale or big data systems such as Apache Spark
- Strong programming skills in Python, Scala, or Java
- Experience with ETL pipelines, data warehousing, and data modelling
- Familiarity with cloud platforms (AWS, GCP, or Azure) and tools like Apache Spark
- Strong problem-solving skills
- Master's degree in Computer Science, Engineering, Data Science, or a related field
- Experience working with healthcare data and standards such as FHIR or HL7
- Familiarity with machine learning concepts and LLM fine-tuning workflows
- Experience using data orchestration tools such as Apache Airflow