Brillio is one of the fastest growing digital technology service providers and a partner of choice for many Fortune 1000 companies. The Senior Lead Data Engineer will design and maintain ETL/ELT pipelines, optimize cloud infrastructure, and ensure data quality while collaborating in agile teams.
Responsibilities:
- Design, develop, and maintain ETL/ELT pipelines using PySpark and Python in Databricks (including notebooks, jobs, Delta Lake tables, and Unity Catalog for governance)
- Implement medallion architecture (bronze/silver/gold layers) and optimize Spark jobs for performance, cost, scalability, and reliability (handling partitioning, skew, caching, adaptive query execution, etc.)
- Write efficient SQL queries for data transformation, validation, and analytics within Databricks
- Provision and manage cloud infrastructure using Terraform (IaC) for Databricks workspaces, clusters, jobs, storage (ADLS/S3), networking, IAM roles/permissions, and related resources on Azure and/or AWS
- Implement and maintain CI/CD pipelines using Jenkins, GitHub (Actions/Repositories), and branching strategies for automated testing, deployment of notebooks, jobs, Delta Live Tables, and Terraform configurations
- Integrate data from diverse sources (databases, APIs, streaming, files) into cloud storage and processing layers
- Ensure data quality, lineage, security, and compliance (including Delta Lake ACID transactions, schema evolution, time travel, and access controls)
- Monitor pipeline performance, troubleshoot failures, and implement alerting/observability (using Databricks tools, cloud monitoring services, or third-party solutions)
- Optimize cloud costs through auto-scaling clusters, spot instances, job scheduling, and efficient resource usage
- Collaborate in agile teams, participate in code reviews, and contribute to best practices for data engineering
Requirements:
- Strong proficiency in Python and PySpark for distributed data processing and ETL
- Advanced SQL skills with experience in complex querying, window functions, and optimization
- Hands-on experience with Databricks (clusters, notebooks, Delta Lake, Unity Catalog, Delta Live Tables, workflows/jobs)
- Proficiency in Terraform for infrastructure provisioning and management (Databricks resources, cloud storage, IAM, networking)
- Experience with GitHub for version control and collaboration (branching, pull requests, code reviews)
- Solid knowledge of CI/CD practices and tools, particularly Jenkins (pipelines, plugins for Databricks/GitHub/Terraform)
- Working experience on Azure (Data Lake, Data Factory, Synapse, Key Vault, etc.) and/or AWS (S3, Glue, EMR, IAM, Lambda, etc.)
- Understanding of big data concepts, data modeling (star/snowflake, dimensional), and lakehouse principles
- Familiarity with performance tuning in Spark/Databricks environments