Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. They are seeking a skilled Databricks Platform Engineer to architect and implement scalable data solutions, design ETL/ELT pipelines, and drive performance optimization on the Databricks Lakehouse Platform.
Responsibilities:
- Architect and implement scalable data solutions on the Databricks Lakehouse Platform, utilizing Apache Spark and PySpark for distributed processing at scale
- Design and execute ETL/ELT pipelines with SQL and Delta Lake, ensuring ACID transactions, schema evolution, and reliable data ingestion from sources like AWS S3, Azure ADLS, and Google Cloud Storage (GCS)
- Build and orchestrate Databricks Workflows for automated job scheduling, dependency management, and end-to-end data pipeline execution
- Develop advanced Data Modeling strategies, including dimensional modeling and unified analytics layers, optimized for Databricks' lakehouse architecture
- Drive Performance Optimization through Spark tuning, caching strategies, and Delta Lake optimizations to handle petabyte-scale workloads efficiently
- Integrate with Cloud Platforms (AWS, Azure, GCP) for seamless hybrid deployments, leveraging native storage and compute resources
- Implement CI/CD pipelines with Git for version-controlled Databricks notebooks, clusters, and workflows, enabling rapid iteration and deployment
- Enforce Security & Access Controls using Unity Catalog, row/column-level security, and fine-grained permissions to protect sensitive data assets
- Collaborate in Agile methodologies, contributing to sprint planning, code reviews, and iterative delivery of data engineering features
- Monitor, troubleshoot, and scale Databricks environments, delivering cost-effective, high-performance data platforms that power business intelligence and ML use cases
Requirements:
- Architect and implement scalable data solutions on the Databricks Lakehouse Platform, utilizing Apache Spark and PySpark for distributed processing at scale
- Design and execute ETL/ELT pipelines with SQL and Delta Lake, ensuring ACID transactions, schema evolution, and reliable data ingestion from sources like AWS S3, Azure ADLS, and Google Cloud Storage (GCS)
- Build and orchestrate Databricks Workflows for automated job scheduling, dependency management, and end-to-end data pipeline execution
- Develop advanced Data Modeling strategies, including dimensional modeling and unified analytics layers, optimized for Databricks' lakehouse architecture
- Drive Performance Optimization through Spark tuning, caching strategies, and Delta Lake optimizations to handle petabyte-scale workloads efficiently
- Integrate with Cloud Platforms (AWS, Azure, GCP) for seamless hybrid deployments, leveraging native storage and compute resources
- Implement CI/CD pipelines with Git for version-controlled Databricks notebooks, clusters, and workflows, enabling rapid iteration and deployment
- Enforce Security & Access Controls using Unity Catalog, row/column-level security, and fine-grained permissions to protect sensitive data assets
- Collaborate in Agile methodologies, contributing to sprint planning, code reviews, and iterative delivery of data engineering features
- Monitor, troubleshoot, and scale Databricks environments, delivering cost-effective, high-performance data platforms that power business intelligence and ML use cases