PythonSparkSQLMachine LearningMLNatural Language ProcessingMLOpsDatabricksGitSource Control
About this role
Role Overview
Support the training, monitoring and improvement of a suite of data science models (primarily text-based classification models)
Assist in the development of new product concepts
Support the review and evaluation of potential new datasets
Assist in developing and implementing efficient strategies for creating high-quality labelled training datasets, leveraging automation, weak supervision, and active learning techniques
Design, implement, and maintain rule-based data processing logic leveraging regex and other pattern-matching approaches
Assist in developing monitoring systems for in-life machine learning models that automatically detect and flag issues
Work with stakeholders to define and implement new machine learning applications based on transaction data
Requirements
2+ years’ experience working with large datasets in a commercial or academic environment.
Experience in SQL and Python in a professional context.
Fast learner and comfortable with uncertainty and change; we are a scale-up
Comfortable working with data cleaning, transformation, and basic scripting tasks.
Strong attention to detail and a focus on data quality.
Experience monitoring and enhancing in-life ML Models (MLOps).
Familiarity with regex or willingness to learn quickly.
Knowledge of or experience with developing production code and source control via Git.
Knowledge of or experience with Spark/Databricks.
Familiarity with classification, time series, and/or natural language processing.
Knowledge of or experience working with consumer data, banking data, or stocks and shares.
Planning skills to help you prioritise work across multiple projects.