Work day-to-day with quant modellers to prepare, refine and maintain datasets used for research, modelling and analysis
Help improve the structure, quality and usability of underlying data so that it can be consumed efficiently by quant workflows
Investigate data issues affecting modelling outputs, identifying root causes and working with relevant teams to resolve them
Support the development of repeatable data preparation processes that make research datasets more reliable, consistent and easier to work with
Build and maintain Python-based data workflows and supporting pipelines for ingestion, transformation and validation of modelling data
Maintain and further develop Pythia’s historical data assets, ensuring they remain accurate, accessible and fit for analytical use
Work with engineers to improve upstream and downstream data flows, helping ensure that critical data is captured and processed effectively
Support data migrations, backfills and structural improvements where required to improve the usefulness and reliability of modelling datasets
Contribute to the development of tooling and processes that make it easier to explore, prepare and troubleshoot data used by the quant team
Ensure data quality and integrity through validation, reconciliation and targeted monitoring across key datasets
Expand visibility into data issues by improving checks, alerts and investigative workflows across critical pipelines and sources
Define and improve data logic, transformations and assumptions, ensuring they are clearly documented and consistently applied across datasets
Improve the clarity and usability of data through better documentation, metadata management and standardisation of definitions
Work closely with engineering and operational teams to resolve anomalies, gaps and inconsistencies in source data
Contribute to the ongoing evolution of Pythia’s data capabilities, balancing immediate modelling needs with longer-term improvements to data quality and maintainability
Requirements
Strong experience in a Quant Data Engineer, Research Data Engineer or similar role working with complex datasets
Strong Python experience for data processing, investigation and workflow development
Excellent SQL skills and strong experience working with relational databases, preferably PostgreSQL
Proven experience preparing, transforming and validating datasets for analytical, modelling or research use cases
Strong experience investigating data issues and tracing problems through pipelines, transformations and source systems
Experience building and maintaining data pipelines or processing workflows in production environments
A strong understanding of data quality, reconciliation and validation practices
Experience working closely with technical stakeholders to understand how data is consumed and how it can be improved
Confidence working with messy, incomplete or evolving datasets and turning them into reliable assets for downstream users
Experience with analytical data warehouse technologies such as ClickHouse, BigQuery, Snowflake, Redshift or similar would be beneficial
Experience with version control systems (preferably GitLab) and working with tools such as JIRA & Confluence
Experience working in Agile environments and collaborating with distributed teams
Ability to work well in a dynamic, fast-paced environment and quickly adapt to new technologies and requirements
A passion for detail and problem solving, with excellent verbal and written communication skills