Design and implement data processing systems, including data warehouses, data lakes, and real-time processing platforms;
Configure and manage technologies such as Hadoop, Spark, and Kafka, as well as cloud environments across Azure, AWS, and GCP;
Build and maintain automated ETL/ELT processes for data collection, cleansing, and transformation;
Ensure seamless, reliable data flow between diverse systems and sources, with a strong focus on data quality and consistency;
Optimize data systems for high-volume, high-velocity workloads;
Design and implement distributed computing solutions that maintain performance at scale, proactively identifying and resolving bottlenecks.
Requirements
Hands-on experience with PySpark for large-scale data processing;
Strong knowledge of Apache Kafka for real-time data streaming;
Cloud platform experience across Azure, AWS, and/or GCP;
Proven ability to design and optimize ETL/ELT pipelines;
Familiarity with Hadoop ecosystems and distributed computing principles;
Solid understanding of data warehouse and data lake architectures;
Nice to haves: experience with infrastructure-as-code tools (Terraform, Bicep), knowledge of data governance and security best practices, exposure to orchestration tools such as Apache Airflow or Azure Data Factory.
Tech Stack
Airflow
Apache
AWS
Azure
Cloud
ETL
Google Cloud Platform
Hadoop
Kafka
PySpark
Spark
Terraform
Benefits
Flexible Working Hours – Manage your workday with flexibility and the option to work from home when needed, while enjoying our city-centre office as a convenient, collaborative workspace;
Culture & Connection – From team bonding activities like Christmas parties and summer events to spontaneous celebrations, monthly breakfasts, or team lunches, we celebrate wins—big or small—together;