AirflowApacheAWSDockerEC2ETLKubernetesPySparkSparkSQLAIELTData EngineeringData LakeAnalyticsDatabricksApache AirflowdbtEKSLambdaS3IAMCommunicationRemote Work
About this role
Role Overview
Design and build scalable ETL/ELT pipelines using both batch and streaming approaches
Develop ingestion workflows from multiple sources such as databases, APIs, and event streams
Implement ingestion strategies including full load, incremental load, and CDC
Orchestrate data workflows using Apache Airflow
Manage data connectors using Airbyte
Work with Databricks Lakehouse to build and optimize data processing pipelines
Write and optimize complex SQL queries for analytics and transformation
Build modular and testable data models using dbt (staging → intermediate → marts)
Maintain data quality, observability, and reliability across the platform
Work with AWS services such as S3, Lambda, EC2, IAM
Containerize data services using Docker and Kubernetes (EKS) when needed
Document pipelines, data models, and data dictionaries for long-term maintainability
Requirements
At least 4 years of experience in Data Engineering
Strong understanding of data architectures such as Data Lake, Data Warehouse, and Lakehouse
Hands-on experience with ETL/ELT pipelines, including batch and streaming processing
Familiar with ingestion patterns: full load, incremental, CDC, event-driven
Experience working with Databricks (Delta Live Tables, Jobs, Notebooks)
Strong skills in PySpark or Spark SQL for large-scale data processing
Solid understanding of Delta Lake (ACID, time travel, schema evolution)
Experience with Apache Airflow (DAGs, scheduling, monitoring)
Experience with Airbyte or similar ingestion tools