Tech Genius inc is seeking an Azure Databricks Engineer to design and build data pipelines. The role involves developing data ingestion pipelines, managing cluster configurations, and ensuring data quality and validation within the Azure Databricks platform.
Responsibilities:
- Develop scalable and efficient data ingestion pipelines using Azure Databricks and Apache Spark. Working knowledge of DBT a plus
- Create and maintain data transformation scripts and notebooks using Spark with Python, SQL, or Scala
- Implement ELT/ETL workflows to prepare data for analytics and reporting
- Configure Azure Databricks clusters for optimal performance, cost-efficiency, and resource utilization
- Manage cluster policies, including autoscaling, instance types and runtime versions
- Monitor cluster health and tune Spark configurations to improve job execution times
- Implement data quality checks and validation processes within Databricks notebooks and workflows
- Develop and maintain CI/CD pipelines for automated deployment of Databricks notebooks, jobs, and configurations
- Monitor Databricks jobs, clusters, and pipelines for failures or performance bottlenecks. Platform usage metrics
- Troubleshoot and resolve issues related to data processing, cluster performance, job execution and platform related issues
- Track and analyze Databricks usage and costs
- Optimize cluster configurations and usage patterns to reduce expenses
- Implement policies to control resource consumption and prevent cost overruns
- Provide technical support and guidance on Databricks platform usage and best practices
Requirements:
- Develop scalable and efficient data ingestion pipelines using Azure Databricks and Apache Spark
- Create and maintain data transformation scripts and notebooks using Spark with Python, SQL, or Scala
- Implement ELT/ETL workflows to prepare data for analytics and reporting
- Configure Azure Databricks clusters for optimal performance, cost-efficiency, and resource utilization
- Manage cluster policies, including autoscaling, instance types and runtime versions
- Monitor cluster health and tune Spark configurations to improve job execution times
- Implement data quality checks and validation processes within Databricks notebooks and workflows
- Develop and maintain CI/CD pipelines for automated deployment of Databricks notebooks, jobs, and configurations
- Monitor Databricks jobs, clusters, and pipelines for failures or performance bottlenecks
- Troubleshoot and resolve issues related to data processing, cluster performance, job execution and platform related issues
- Track and analyze Databricks usage and costs
- Optimize cluster configurations and usage patterns to reduce expenses
- Implement policies to control resource consumption and prevent cost overruns
- Provide technical support and guidance on Databricks platform usage and best practices
- Working knowledge of DBT a plus