Design and build production-grade data pipelines using batch, streaming, incremental, and CDC-based patterns.
Build ingestion workflows from operational systems such as MongoDB, PostgreSQL, RDS, APIs, and event streams.
Design and operate data migration workflows, including full load, incremental sync, CDC replay, cutover, rollback, and reconciliation.
Convert semi-structured or NoSQL data into reliable relational and analytical models.
Build and optimize data processing jobs using Python, PySpark, Spark SQL, SQL, and Databricks.
Orchestrate workflows using Apache Airflow and manage connectors using Airbyte or similar tools.
Maintain data quality, observability, alerting, backfills, and production reliability across pipelines.
Work with AWS services such as S3, Lambda, IAM, EC2, RDS, DMS, SQS, Kinesis, or similar services.
Build modular, testable transformation layers using dbt where appropriate.
Document data flows, source-to-target mapping, pipeline behavior, data contracts, and operational runbooks.

6+ years of experience in Data Engineering, not only analytics, BI, or reporting.
Proven hands-on ownership of production data pipelines, including design, implementation, deployment, monitoring, debugging, and backfill/recovery.
Strong experience building ETL/ELT pipelines with full load, incremental load, and CDC-based patterns.
Good understanding of CDC correctness, including idempotency, deduplication, ordering, deletes/tombstones, late-arriving events, replay, and reconciliation.
Hands-on experience with OLTP databases, preferably MongoDB and PostgreSQL/RDS.
Practical experience with schema design, including relational modeling, constraints, indexing, normalization/denormalization, and source-to-target mapping.
Experience migrating or syncing data between operational systems and analytical platforms, including validation, cutover, rollback, and reconciliation.
Strong SQL skills, including joins, CTEs, window functions, query optimization, and debugging incorrect results.
Strong Python or PySpark experience for data processing, automation, validation, and pipeline development.
Experience with production pipeline reliability: retries, idempotency, monitoring, alerting, backfill, and incident handling.
Hands-on AWS data pipeline experience using services such as S3, Lambda, IAM, RDS, DMS, SQS, Kinesis, Glue, or equivalent.

Data Engineer

Key skills