Architect Scalable Data Pipelines: Design, develop, and maintain reliable ETL/ELT workflows using Databricks, Spark, and Python.
Enable Data Access & Analytics: Partner with analytics, product, and engineering teams to ensure timely, accurate, and governed access to data for downstream reporting and analytics.
Optimize Data Workflows: Improve performance, reduce latency, and streamline processes by tuning SQL, optimizing Spark jobs, and enhancing cloud data pipelines.
Leverage Cloud Infrastructure: Utilize AWS services (e.g., S3, Glue, Lambda) to manage and scale data engineering workloads.
Drive Best Practices: Establish and maintain data engineering standards, including code quality, data security, version control, and documentation.
Build & Maintain Data Models: Construct and support dimensional and normalized data models that support cross-functional use cases and reporting needs.
Automation & Monitoring: Set up robust pipeline orchestration (e.g., with Airflow, Databricks Jobs, or AWS Step Functions) and monitoring/alerting systems.
Collaborate Cross-Functionally: Work with data analysts, scientists, and business users to understand requirements and transform raw data into business-ready datasets.
Requirements
3+ years of experience as a Data Engineer or in a similar role.
Strong hands-on experience with Databricks (Spark, Delta Lake) and Python-based ETL frameworks.
Solid experience working with AWS cloud services for data processing and storage.
Moderate-to-advanced proficiency in SQL for data wrangling, transformation, and performance tuning.
Experience with data lake architectures, ELT/ETL development, and orchestration tools.
Familiarity with software engineering best practices, including CI/CD, version control, and code reviews.
Experience with Power BI or other BI tools (e.g., Tableau, Looker) to assist in data visualization or self-service reporting enablement.