The Voleon Group is a technology company that specializes in applying AI and machine learning techniques to finance. They are seeking a Data Scientist for their Feature Engineering team to turn complex datasets into predictive signals for machine learning models, involving tasks from data sourcing to feature construction and validation.
Responsibilities:
- Explore, profile, and curate complex and often messy datasets from third-party vendors and internal sources, developing a deep understanding of what each dataset can and cannot tell us
- Harness financial intuition, academic research, and statistical rigor to inform design and implementation of predictive features in collaborative setting
- Validate features through a disciplined, test-driven framework — including cross-sectional analysis, stationarity testing, and point-in-time correctness — to ensure signals are real and not artifacts of data issues
- Build and maintain data pipelines that bring features from prototype to production, with monitoring for data health and correctness along the way
- Communicate your findings clearly — both the signal you've found and the story of how the data produces it — to researchers and leadership
- Proactively investigate anomalies in data feeds and production behavior, performing root-cause analysis and surfacing issues to relevant stakeholders
- Leverage AI tools to accelerate exploration, coding, and analysis — and share what you learn about effective workflows with the team
Requirements:
- 2 years of applied industry experience (including internships) working end-to-end with complex datasets: curation, querying, aggregation, exploratory analysis, and visualization
- Experience using statistical methods to analyze data, identify patterns, conduct root-cause analysis, and translate findings into actionable insights
- Ability to frame and answer questions mathematically
- Ability to infer useful forward-looking directions from the results of retrospective analysis
- Fluency in managing, processing, and visualizing tabular data using SQL and Python (Pandas or Polars)
- Basic software development skills and experience with bash, Linux/Unix, and git
- Ability to refine ambiguous requests into well-scoped analyses and communicate results with clarity and precision
- Bachelor's degree in a quantitative discipline (statistics, data science, computer science, economics, physics, or a related field)
- Master's degree in a quantitative discipline
- Prior industry experience or demonstrated interest in finance — academic projects, coursework in financial engineering, or industry internships
- Familiarity with financial datasets such as Compustat, IBES, or similar vendor data
- Experience developing in a production-facing environment with standard tooling (CI/CD, git, workflow orchestration)
- Hands-on experience with AI coding assistants or LLM-based tools in a data science or engineering workflow
- A track record of curiosity-driven exploration — side projects, Kaggle competitions, research papers, or anything that shows you can't leave an interesting dataset alone