AirflowApacheAWSEC2ETLMongoDBNoSQLPostgresPySparkPythonSparkSQLAIELTData EngineeringAnalyticsBIDatabricksApache AirflowdbtLambdaS3RDSIAMSQSGlueKinesisPostgreSQLRemote Work
About this role
Role Overview
Design and build production-grade data pipelines using batch, streaming, incremental, and CDC-based patterns.
Build ingestion workflows from operational systems such as MongoDB, PostgreSQL, RDS, APIs, and event streams.
Design and operate data migration workflows, including full load, incremental sync, CDC replay, cutover, rollback, and reconciliation.
Convert semi-structured or NoSQL data into reliable relational and analytical models.
Build and optimize data processing jobs using Python, PySpark, Spark SQL, SQL, and Databricks.
Orchestrate workflows using Apache Airflow and manage connectors using Airbyte or similar tools.
Maintain data quality, observability, alerting, backfills, and production reliability across pipelines.
Work with AWS services such as S3, Lambda, IAM, EC2, RDS, DMS, SQS, Kinesis, or similar services.
Build modular, testable transformation layers using dbt where appropriate.
Document data flows, source-to-target mapping, pipeline behavior, data contracts, and operational runbooks.
Requirements
6+ years of experience in Data Engineering, not only analytics, BI, or reporting.
Proven hands-on ownership of production data pipelines, including design, implementation, deployment, monitoring, debugging, and backfill/recovery.
Strong experience building ETL/ELT pipelines with full load, incremental load, and CDC-based patterns.
Good understanding of CDC correctness, including idempotency, deduplication, ordering, deletes/tombstones, late-arriving events, replay, and reconciliation.
Hands-on experience with OLTP databases, preferably MongoDB and PostgreSQL/RDS.
Practical experience with schema design, including relational modeling, constraints, indexing, normalization/denormalization, and source-to-target mapping.
Experience migrating or syncing data between operational systems and analytical platforms, including validation, cutover, rollback, and reconciliation.
Strong SQL skills, including joins, CTEs, window functions, query optimization, and debugging incorrect results.
Strong Python or PySpark experience for data processing, automation, validation, and pipeline development.
Experience with production pipeline reliability: retries, idempotency, monitoring, alerting, backfill, and incident handling.
Hands-on AWS data pipeline experience using services such as S3, Lambda, IAM, RDS, DMS, SQS, Kinesis, Glue, or equivalent.
Tech Stack
Airflow
Apache
AWS
EC2
ETL
MongoDB
NoSQL
Postgres
PySpark
Python
Spark
SQL
Benefits
Attractive salary range and we are open to negotiate if you're a strong fit.
Hybrid/Remote-friendly culture, work where you grow best.
Flexible hours, async teamwork (we respect your focus time).
Work equipment support.
Allowance for Certification & Skill Development.
Year-end bonus & performance-based rewards.
15 paid leaves a year.
Career growth with personal coaching sessions.
Open, collaborative team culture
no micromanagement, only trust.
Tools & AI-powered workflows that make remote work easier.