FI-2010 lower-complexity conjecture explaining generalization gap

Determine whether the observed performance gap between FI-2010 and NASDAQ datasets in stock price trend prediction arises because the FI-2010 dataset is less complex—owing to lower liquidity and market efficiency of Finnish stocks and its 2010 time period—than NASDAQ stocks such as Tesla and Intel from 2015.

Background

The authors report substantially higher F1-scores on the FI-2010 benchmark than on NASDAQ stocks Tesla and Intel. They hypothesize that the Finnish stocks in FI-2010 are less liquid and efficient and that the older 2010 period makes the dataset easier, contributing to better model performance.

They explicitly frame this explanation as a conjecture, suggesting that establishing or refuting it would clarify why models that perform well on FI-2010 often struggle on more efficient markets.

References

We conjecture that this is due to the fact that FI-2010 is characterized by a lower level of complexity with respect to NASDAQ stocks.

— TLOB: A Novel Transformer Model with Dual Attention for Price Trend Prediction with Limit Order Book Data (2502.15757 - Berti et al., 12 Feb 2025) in Results, Subsection “Tesla and Intel results”

FI-2010 lower-complexity conjecture explaining generalization gap

Sponsor

Background

References

Related Problems