ETLPandasPySparkPythonJupyterData EngineeringData MiningGitGitHubBitbucketVersion ControlCommunicationRemote Work
About this role
Role Overview
Perform data analysis and data mining to support business decisions
Formulate and validate data-driven hypotheses
Develop and maintain ETL processes using PySpark
Ensure high-quality, production-ready data pipelines and Python code
Collaborate with business and technical stakeholders
Continuously identify opportunities for improvement and innovation
Requirements
At least 3 years of experience working with data (analysis, data science, or data engineering)
Experience in data mining, including hypothesis formulation and testing, target definition, and performance evaluation (coverage, lift, robustness) just only on side projects, testings previous position, pandas, puspark , polars
Ability to deliver high-quality results, even for routine and repetitive tasks
Hands-on experience with PySpark for ETL and data transformations
Practical knowledge of Pandas or Polars (at least one required)
Experience with Git and version control tools (Bitbucket or GitHub)
Ability to follow best practices for production Python code (Jupyter for exploration, clean .py files for production)
Strong communication skills, with the ability to collaborate with both analysts and business units
Comfortable working in an iterative environment, incorporating feedback
Proactive mindset, able to propose and implement new data-driven hypotheses
Good command of English (written and spoken)
Tech Stack
ETL
Pandas
PySpark
Python
Benefits
life insurance
pension contribution
EN/DE/ES lessons
MultiSport card
sick days
meal allowance
lots of company events
quality amenities for lovers of sport, food and coffee