Writing and improving data pipelines which load data in our data lake and transform it to the Feature store
Ensuring all data is available in the right format and reliable quality. Working on improvements of the quality of data as well as challenges as data shift and concept drift
Testing Data pipelines at staging environments and deploying them at production
Writing proprietary packages / frameworks to be used for internal purposes to make the standard tasks easier (such as data load, model testing, exploratory data analysis, etc.)
In collaboration with ML Engineers and Data Scientists taking part in A/B testing and monitoring solutions
Requirements
Python / pandas / pyspark
Experience with building data-powered solutions
Experience with building cloud based solutions
Experience with development, deployment and monitoring of data and machine learning solutions.
Strong coding skills (clean and commented code, version control, documentation) and and usage of AI to boost codding performance.
Knowledge about big data infrastructure
Knowledge and interests in DataOps and MLOps – namely development, testing, deployment and monitoring of data and ML solutions; using tools like MLFlow, KubeFlow, AirFlow or similar; Git and Docker
Interest in data science, machine learning, and forecasting is a big plus
Tech Stack
Airflow
Cloud
Docker
Pandas
PySpark
Python
Benefits
25+ days off, as well as birthday day off and 4 charity days off per year
Flexible start and end of the working day and hybrid working mode, including a combination remote and in the office
Team-centric atmosphere
Encouraging healthy lifestyle and work-life balance including supplemental health insurance