Architect and design scalable, reliable data platforms and complex ETL/ELT and streaming workflows for the Databricks Lakehouse Platform (Delta Lake, Spark).
Write, test, and optimize code in Python, PySpark, and SQL for data ingestion, transformation, and processing.
Implement CI/CD, monitoring, and automation (e.g., with Azure DevOps, DBX) for data pipelines.
Work with BI developers, analysts, and business users to define requirements and deliver data-driven solutions.
Tune delta tables, Spark jobs, and SQL queries for maximum efficiency and scalability.
It is a big plus to have experience in GenAI application development
Requirements
8+ years of experience in data engineering, with strong hands-on expertise in Databricks and Apache Spark.
Proven experience designing and implementing scalable ETL/ELT pipelines in cloud environments.
Strong programming skills in Python and SQL; experience with PySpark required.
Hands-on experience with Databricks Lakehouse, Delta Lake, and distributed data processing.
Experience working with cloud platforms such as Microsoft Azure, AWS, or GCP (Azure preferred).
Experience with CI/CD pipelines, Git, and DevOps practices for data engineering.
Strong understanding of data architecture, data modeling, and performance optimization.
Experience working with cross-functional teams to deliver enterprise data solutions.
Tackles complex data challenges, ensuring data quality and reliable delivery.
Qualifications:
Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field.
Experience designing enterprise-scale data platforms and modern data architectures.
Experience with data integration tools such as Azure Data Factory or similar platforms.
Familiarity with cloud data warehouses such as Databricks, Snowflake, or Azure Fabric.
Experience supporting analytics, reporting, or AI/ML workloads is highly desirable.
Databricks, Azure, or cloud certifications are preferred.
Strong problem-solving, communication, and technical leadership skills.