Tech Genius inc is seeking an Azure Databricks Engineer for their client Guy Carpenter. The role involves designing and building data pipelines, optimizing cluster configurations, and ensuring data quality and validation within the Azure Databricks platform.
Responsibilities:
- Design and Build Data Pipelines
- Develop scalable and efficient data ingestion pipelines using Azure Databricks and Apache Spark. Working knowledge of DBT a plus
- Data Transformation and Processing
- Create and maintain data transformation scripts and notebooks using Spark with Python, SQL, or Scala
- Implement ELT/ETL workflows to prepare data for analytics and reporting
- Cluster Configuration and Optimization
- Configure Azure Databricks clusters for optimal performance, cost-efficiency, and resource utilization
- Manage cluster policies, including autoscaling, instance types and runtime versions
- Monitor cluster health and tune Spark configurations to improve job execution times
- Data Quality and Validation
- Implement data quality checks and validation processes within Databricks notebooks and workflows
- Automation and CI/CD
- Develop and maintain CI/CD pipelines for automated deployment of Databricks notebooks, jobs, and configurations
- Monitoring and Troubleshooting
- Monitor Databricks jobs, clusters, and pipelines for failures or performance bottlenecks. Platform usage metrics
- Troubleshoot and resolve issues related to data processing, cluster performance, job execution and platform related issues
- Cost Management and Optimization
- Track and analyze Databricks usage and costs
- Optimize cluster configurations and usage patterns to reduce expenses
- Implement policies to control resource consumption and prevent cost overruns
- Collaboration and Support
- Provide technical support and guidance on Databricks platform usage and best practices
Requirements:
- Develop scalable and efficient data ingestion pipelines using Azure Databricks and Apache Spark
- Working knowledge of DBT a plus
- Create and maintain data transformation scripts and notebooks using Spark with Python, SQL, or Scala
- Implement ELT/ETL workflows to prepare data for analytics and reporting
- Configure Azure Databricks clusters for optimal performance, cost-efficiency, and resource utilization
- Manage cluster policies, including autoscaling, instance types and runtime versions
- Monitor cluster health and tune Spark configurations to improve job execution times
- Implement data quality checks and validation processes within Databricks notebooks and workflows
- Develop and maintain CI/CD pipelines for automated deployment of Databricks notebooks, jobs, and configurations
- Monitor Databricks jobs, clusters, and pipelines for failures or performance bottlenecks
- Troubleshoot and resolve issues related to data processing, cluster performance, job execution and platform related issues
- Track and analyze Databricks usage and costs
- Optimize cluster configurations and usage patterns to reduce expenses
- Implement policies to control resource consumption and prevent cost overruns
- Provide technical support and guidance on Databricks platform usage and best practices