ETLNumpyPandasPythonSQLMachine LearningMLNumPyData EngineeringStatistical AnalysisRemote Work
About this role
Role Overview
Participate in the full modeling lifecycle, from statistical analysis and experimentation to building, validating, and iterating on machine learning models that address critical business challenges
Own the data foundation by preparing, cleaning and transforming raw, complex data into high-quality features for modeling. Proactively identify and handle missing values, outliers, and inconsistencies
Investigate data discrepancies (tracking bugs, ETL errors, definitional issues) and design automated frameworks to ensure data accuracy
Act as a strategic liaison, collaborating with data Engineering and product teams to drive the data strategy and definition of our centralized feature store, ensuring it becomes the 'single source of truth' for all ML models
Create and maintain clear, authoritative documentation for data sources, cleaning processes, and variable definitions
Requirements
Bachelor's degree (PhD preferred) in a quantitative field (Statistics, Physics, Mathematics, etc.)
Strong proficiency in Python (Pandas/NumPy) and SQL for complex querying and data manipulation
Hands-on experience with data cleaning techniques and data validation frameworks
Familiarity with data visualization tools to help identify and communicate data issues