Designing, building, and maintaining scalable, end-to-end data pipelines for ingesting, cleaning, transforming, and integrating large structured and semi-structured datasets
Optimizing data collection, processing, and storage workflows
Conducting periodic data refresh processes (through data pipelines)
Building a robust ETL infrastructure using SQL technologies
Assisting with data migration to the new platform
Automating manual workflows and optimizing data delivery
Developing data transformation logic using SQL and DBT for Snowflake
Designing and implementing scalable and high-performance data models
Creating matching logic to deduplicate and connect entities across multiple sources
Ensuring data quality, consistency, and performance to support downstream applications
Orchestrating data workflows using Apache Airflow, running on Kubernetes
Monitoring and troubleshooting data pipeline performance and operations
Enabling integration of 3rd-party and pre-cleaned data into a unified schema with rich metadata and hierarchical relationships
Writing data processing logic in Python
Applying software engineering best practices: version control (Git), CI/CD pipelines (GitHub Actions), DevOps workflows
Ensuring code quality using tools like SonarQube
Documenting data processes and workflows
Preparing the platform for future integrations (e.g., REST APIs, LLM/agentic AI)
Requirements
Strong experience with Snowflake and dbt (must-have)
Strong SQL skills, including experience with query optimization
Experience with orchestration tools like Apache Airflow, Azure Data Factory (ADF), or similar
Experience with Docker, Kubernetes, and CI/CD practices for data workflows
Experience in working with large-scale datasets
Very good understanding of data pipeline design concepts and best practices
Very good coding skills in Python
Ability to write clean, scalable, and testable code (including unit tests)
Understanding and applying object-oriented programming (OOP)
Experience with version control systems: Git
Good knowledge of English (minimum C1 level)
Tech Stack
Airflow
Apache
Azure
Docker
ETL
Kubernetes
Python
SQL
Benefits
Flexible working hours and approach to work: fully remote, in the office, or hybrid
Professional growth supported by internal training sessions and a training budget
Solid onboarding with a hands-on approach to give you an easy start
A great atmosphere among professionals who are passionate about their work