Medasource is seeking a highly skilled Data Engineer to design, build, and optimize modern data pipelines within a cloud-first architecture. The role focuses on Azure Databricks and CI/CD best practices, playing a critical part in developing scalable data solutions that support analytics and advanced data initiatives including AI/ML.
Responsibilities:
- Design, build, and maintain scalable data pipelines using Azure Databricks
- Develop and optimize ETL/ELT workflows using Apache Airflow
- Implement CI/CD pipelines and version control strategies using GitHub
- Collaborate with Data Architects, Analysts, and Data Scientists to deliver high-quality datasets
- Develop and maintain data models and transformations (batch and streaming)
- Monitor data pipeline performance and troubleshoot production issues
- Ensure data governance, security, and compliance standards are met
- Optimize Spark workloads for performance and cost efficiency in Azure
Requirements:
- 3+ years of experience as a Data Engineer
- Strong experience with Azure Databricks (Spark, PySpark, SQL)
- Hands-on experience building and maintaining workflows in Apache Airflow
- Proficiency with GitHub for version control and CI/CD
- Strong SQL skills and experience with relational and cloud-based databases
- Experience working in Azure cloud environments
- Solid understanding of data modeling concepts (star schema, normalization, etc.)
- Experience building scalable, production-grade data pipelines