Design and implement reliable data (ELT/ETL) pipelines using Airbyte (for ingestion) and dbt for data modeling and transformation
Configure and monitor software-defined data assets using Dagster. Troubleshoot pipeline failures and ensure SLAs for data freshness are met
Develop and optimize Databricks (Delta Lake) tables. Write efficient Spark/SQL queries to handle large datasets
Write clean, maintainable Python scripts for custom data extraction or utility tasks. Contribute to the team's codebase via Git/GitHub
Monitor containerized workloads on AWS EKS (Kubernetes). Assist in debugging pod failures and resource bottlenecks
Implement tests (dbt tests, Great Expectations) to catch data anomalies early and ensure trust in our data products
Participate in on-call rotation for critical incidents and drive post-mortems to prevent recurrence
Requirements
5+ years’ experience in a Data Engineering role
Strong SQL skills with experience in complex queries, performance tuning, and data modelling (dimensional models, wide analytical tables, and curated “gold” datasets)
Strong Python skills for data manipulation, automation, and scripting
Hands-on experience with dbt (models, testing, and documentation)
Exposure to Databricks and reverse ETL tools
Experience with data orchestration tools such as Dagster, Airflow, or Prefect
Working knowledge of cloud platforms, ideally AWS (S3, EC2, etc.), with exposure to Azure or GCP also valued
Familiarity with Git and modern engineering practices (CI/CD, code reviews, Infrastructure as Code basics)
Experience using AI-assisted development tools (e.g. GitHub Copilot, Cursor)