Improving Stock Market Prediction via Heterogeneous Information Fusion

Published 2 Jan 2018 in cs.SI, physics.soc-ph, and q-fin.ST | (1801.00588v1)

Abstract: Traditional stock market prediction approaches commonly utilize the historical price-related data of the stocks to forecast their future trends. As the Web information grows, recently some works try to explore financial news to improve the prediction. Effective indicators, e.g., the events related to the stocks and the people's sentiments towards the market and stocks, have been proved to play important roles in the stocks' volatility, and are extracted to feed into the prediction models for improving the prediction accuracy. However, a major limitation of previous methods is that the indicators are obtained from only a single source whose reliability might be low, or from several data sources but their interactions and correlations among the multi-sourced data are largely ignored. In this work, we extract the events from Web news and the users' sentiments from social media, and investigate their joint impacts on the stock price movements via a coupled matrix and tensor factorization framework. Specifically, a tensor is firstly constructed to fuse heterogeneous data and capture the intrinsic relations among the events and the investors' sentiments. Due to the sparsity of the tensor, two auxiliary matrices, the stock quantitative feature matrix and the stock correlation matrix, are constructed and incorporated to assist the tensor decomposition. The intuition behind is that stocks that are highly correlated with each other tend to be affected by the same event. Thus, instead of conducting each stock prediction task separately and independently, we predict multiple correlated stocks simultaneously through their commonalities, which are enabled via sharing the collaboratively factorized low rank matrices between matrices and the tensor. Evaluations on the China A-share stock data and the HK stock data in the year 2015 demonstrate the effectiveness of the proposed model.

Abstract PDF Upgrade to Chat

Citations (166)

View on Semantic Scholar

Summary

The paper proposes a novel framework that integrates heterogeneous information from news, social media, and historical data using coupled matrix and tensor factorization to improve stock prediction.
Empirical evaluation on China A-share and HK markets showed significant improvements in prediction accuracy, reaching 62.5% and 61.7% respectively.
The framework offers practical promise for enhancing investor decision-making and theoretical potential for future multi-source data fusion in financial markets.

Improving Stock Market Prediction via Heterogeneous Information Fusion

In the paper titled "Improving Stock Market Prediction via Heterogeneous Information Fusion," the authors propose a novel framework that leverages a variety of information sources to enhance the accuracy of stock market predictions. Traditional approaches in stock forecasting rely heavily on historical data, often overlooking valuable insights available from diverse information streams. Recognizing this limitation, this study seeks to integrate various heterogeneous information sources, including events from Web news and user sentiments from social media, effectively captured through a coupled matrix and tensor factorization framework.

Methodology

The core of the proposed methodology revolves around the notion that the stock market is influenced by numerous, interconnected factors that single-source models fail to encapsulate adequately. The framework builds a third-order tensor to fuse data from different sources and capture the intrinsic relations among events and investors' sentiments. The authors mitigate the sparsity issue inherent in high-dimensional data by employing auxiliary matrices: the stock quantitative feature matrix and the stock correlation matrix. These matrices are used collaboratively to support tensor decomposition, allowing for simultaneous prediction of multiple stock movements by exploiting known commonalities through shared low-rank matrices.

The method leverages a multi-task learning approach, predicting multiple correlated stocks simultaneously. This is a significant divergence from traditional approaches, where predictions are typically performed for each stock independently, thus lacking synergy. For instance, by assessing the co-movements of related stocks, a more nuanced and comprehensive forecasting model is developed.

Results and Implications

The effectiveness of the proposed model is demonstrated through empirical evaluations on the China A-share and Hong Kong stock datasets from 2015. The results show significant improvements in prediction accuracy, achieving accuracies of 62.5% and 61.7% for China A-share and HK markets, respectively. These figures outperform traditional baselines and underscore the efficiency of the proposed multi-source information integration technique.

The implications of these findings are multifold. Practically, enhancing prediction accuracy holds substantial promise for investors and financial analysts, allowing for more informed decision-making. Theoretically, this study opens avenues for further exploration of multi-source data fusion in financial markets, potentially leading to more robust models that capture the complexities of market dynamics more thoroughly. The coupled matrix and tensor factorization framework can serve as a cornerstone for integrating additional data types, such as time-series data and external macroeconomic indicators, into predictive models.

Future Developments

Several lines of future research could extend the contributions of this paper. One promising avenue is the application of deep learning techniques within the tensor factorization framework to harness even more complex patterns inherent in the data. Furthermore, expanding the dataset to include a longitudinal multi-year analysis could provide additional insights into the model’s robustness and adaptability over time.

Additionally, the framework can be refined to incorporate real-time data processing to adjust predictions dynamically with the instantaneous financial landscape. This dynamic adaptability is crucial in fast-paced market environments and could further improve prediction timeliness and relevance.

In conclusion, the paper provides a comprehensive and robust framework for stock market prediction through heterogeneous information fusion. It achieves a notable improvement over prior methods by integrating multiple sources of information into a collaborative predictive model, thus setting a foundation for future developments in the field of predictive financial analytics.