Twitter mood predicts the stock market

Published 14 Oct 2010 in cs.CE, cs.CL, cs.SI, and physics.soc-ph | (1010.3003v1)

Abstract: Behavioral economics tells us that emotions can profoundly affect individual behavior and decision-making. Does this also apply to societies at large, i.e., can societies experience mood states that affect their collective decision making? By extension is the public mood correlated or even predictive of economic indicators? Here we investigate whether measurements of collective mood states derived from large-scale Twitter feeds are correlated to the value of the Dow Jones Industrial Average (DJIA) over time. We analyze the text content of daily Twitter feeds by two mood tracking tools, namely OpinionFinder that measures positive vs. negative mood and Google-Profile of Mood States (GPOMS) that measures mood in terms of 6 dimensions (Calm, Alert, Sure, Vital, Kind, and Happy). We cross-validate the resulting mood time series by comparing their ability to detect the public's response to the presidential election and Thanksgiving day in 2008. A Granger causality analysis and a Self-Organizing Fuzzy Neural Network are then used to investigate the hypothesis that public mood states, as measured by the OpinionFinder and GPOMS mood time series, are predictive of changes in DJIA closing values. Our results indicate that the accuracy of DJIA predictions can be significantly improved by the inclusion of specific public mood dimensions but not others. We find an accuracy of 87.6% in predicting the daily up and down changes in the closing values of the DJIA and a reduction of the Mean Average Percentage Error by more than 6%.

Abstract PDF Upgrade to Chat

Citations (4,957)

View on Semantic Scholar

Summary

The paper demonstrates that public mood data from Twitter, especially the GPOMS Calm metric, can predict DJIA changes with lags of 2–6 days.
It validates the predictive power of sentiment analysis using Granger causality and SOFNN modeling, outperforming traditional baseline models.
The findings underscore practical implications for real-time sentiment filtering and integrating mood data into financial forecasting systems.

This paper, "Twitter mood predicts the stock market" (1010.3003), explores the hypothesis that collective mood states, derived from large-scale Twitter feeds, are correlated with and potentially predictive of changes in the Dow Jones Industrial Average (DJIA). The research aims to demonstrate how sentiment analysis of social media can offer early indicators for economic trends, challenging the Efficient Market Hypothesis which suggests unpredictability of market prices based solely on news.

The study used a dataset of nearly 10 million public tweets posted between February 28 and December 19, 2008. A crucial data preprocessing step involved filtering tweets to include only those containing explicit mood statements (e.g., "i feel", "i'm feeling") and removing potential spam/informational messages (those with "http:" or "www."). Daily aggregations of these filtered tweets formed the basis for public mood time series. The corresponding daily DJIA closing values were obtained from Yahoo! Finance.

Two mood assessment tools were employed:

OpinionFinder (OF): This tool measures mood along a single dimension: positive vs. negative. It calculates the ratio of positive to negative tweets based on a lexicon of positive and negative words.
Google-Profile of Mood States (GPOMS): Developed by the authors, GPOMS provides a more nuanced, six-dimensional view of public mood: Calm, Alert, Sure, Vital, Kind, and Happy. It uses an expanded lexicon derived from the well-established POMS psychometric instrument and word co-occurrences in large web corpora.

Both daily mood time series and the DJIA daily changes ( $D_t = \text{DJIA}_t - \text{DJIA}_{t-1}$ ) were normalized to z-scores for comparability.

The methodologies used to investigate the relationship between mood and DJIA were:

Cross-validation: The mood time series were validated by analyzing their response to known socio-cultural events with expected public emotional impact, specifically the US Presidential Election and Thanksgiving in late 2008. GPOMS was found to capture a more differentiated public mood response than OpinionFinder, showing distinct changes across multiple dimensions (e.g., drop in Calm before the election, spike in Happy on Thanksgiving).
Granger Causality Analysis: This econometric technique was used to assess whether lagged mood values statistically predict current DJIA changes better than lagged DJIA values alone. The analysis was performed on data from February 28 to November 3, 2008, to exclude the potentially anomalous period around the election/Thanksgiving. The key finding here was that only the GPOMS Calm dimension showed statistically significant Granger causality with DJIA changes, particularly at lags of 2 to 6 days. Other dimensions (Alert, Sure, Vital, Kind) and OpinionFinder's general positive/negative sentiment did not exhibit this predictive relationship in a linear model. Visualizations showed lagged Calm scores often aligning with subsequent DJIA value changes, except during major unexpected news like the bank bailout announcement.
Self-Organizing Fuzzy Neural Network (SOFNN) Prediction: To investigate non-linear relationships and the practical impact of including mood data on prediction accuracy, an SOFNN model was used. This model was trained to predict the next day's DJIA value. The inputs to the SOFNN included combinations of past 3 days of DJIA values and past 3 days of various mood time series values. The training period was Feb 28 to Nov 28, 2008, and the test period was Dec 1 to Dec 19, 2008.

Different input combinations for the SOFNN were tested to compare performance:

I0: Baseline using only past 3 DJIA values.
IOF: Baseline + OpinionFinder sentiment (past 3 days).
I1: Baseline + GPOMS Calm (past 3 days).
I1,X: Baseline + GPOMS Calm + GPOMS dimension X (past 3 days).

Prediction accuracy was measured by Mean Absolute Percentage Error (MAPE) and Direction Accuracy (correctly predicting an up or down movement).

The SOFNN results provided crucial practical insights:

Including OpinionFinder's positive/negative sentiment (IOF) did not improve prediction accuracy (MAPE 1.95%, Direction 73.3%) compared to the baseline (I0: MAPE 1.94%, Direction 73.3%). This reinforces the Granger causality finding that general positive/negative sentiment wasn't the primary predictive factor.
Adding GPOMS Calm (I1) significantly improved both MAPE (1.83%) and Direction Accuracy (86.7%) over the baseline and IOF. This was the most impactful single mood dimension for prediction.
Adding other GPOMS dimensions in addition to Calm had mixed results. Some combinations (e.g., Calm+Sure, Calm+Vital) actually decreased accuracy, while Calm+Happy (I1,6) resulted in the lowest MAPE (1.79%) and good direction accuracy (80.0%). The improved performance with Calm+Happy despite Happy's lack of independent linear Granger causality suggests a potentially non-linear interaction between these mood dimensions in influencing the market.

The 86.7% direction accuracy achieved with the Calm dimension input was found to be statistically significant, unlikely to be a result of chance.

Practical Implementation Considerations:

Data Acquisition: Accessing and filtering large volumes of real-time tweets requires robust infrastructure and adherence to platform APIs and terms of service. Historical data like that used in the paper (2008) may be easier to obtain for research, but real-time prediction requires a live stream.
Sentiment Tooling: Implementing GPOMS requires building or obtaining the specialized lexicon and scoring mechanism. OpinionFinder is publicly available but found less effective in this study. The process relies on lexicon matching and simple aggregation, which is computationally lighter than complex deep learning models but may miss nuanced context.
Time Series Alignment: Aligning daily tweet data with daily stock market data requires careful handling of time zones, market holidays, and the definition of a "trading day." The paper explicitly notes not extrapolating for weekends/holidays.
Prediction Model: The paper used a SOFNN, a type of neural network suitable for non-linear time series. Implementing an SOFNN requires defining its structure and training process, including parameter tuning, although the paper states parameters were kept consistent across input variations for fair comparison. Other models like LSTMs or ARIMA could also be considered for time series forecasting.
Lag Determination: The study found optimal lags (3-4 days) for the Calm dimension. In a real-world system, determining the appropriate lag for mood indicators relative to market movements might require ongoing analysis or be treated as a model parameter.
Computational Resources: Processing millions of tweets daily for sentiment analysis and training/running prediction models requires significant computational power, especially for real-time applications.
Limitations: The paper's findings are based on 2008 data. Twitter's user base, demographics, content, and public sentiment analysis methods have evolved significantly since then. Applying this directly today would require re-validating the mood tools and relationships with current market data. The specific causal mechanism remains unclear.

In summary, the paper provides a strong empirical case that public mood, specifically the "Calm" dimension extracted from Twitter using the GPOMS tool, holds predictive power for short-term DJIA movements, improving prediction accuracy over models based solely on historical market data or general positive/negative sentiment. This highlights the potential of leveraging specific dimensions of collective emotional states from social media for financial forecasting, albeit with caveats regarding the specific tools, data sources, and time periods studied.

Markdown