Papers
Topics
Authors
Recent
Search
2000 character limit reached

Mining Tweets to Predict Future Bitcoin Price

Published 3 Dec 2024 in cs.AI | (2412.02148v1)

Abstract: Bitcoin has increased investment interests in people during the last decade. We have seen an increase in the number of posts on social media platforms about cryptocurrency, especially Bitcoin. This project focuses on analyzing user tweet data in combination with Bitcoin price data to see the relevance between price fluctuations and the conversation between millions of people on Twitter. This study also exploits this relationship between user tweets and bitcoin prices to predict the future bitcoin price. We are utilizing novel techniques and methods to analyze the data and make price predictions.

Summary

  • The paper demonstrates that ridge regression outperforms decision trees and neural networks in predicting next-day Bitcoin prices.
  • The paper finds that a predominance of neutral tweets limits the standalone predictive power of sentiment analysis.
  • The paper employs clustering and classification techniques, with Random Forest achieving 62% accuracy in predicting directional price shifts.

Analyzing Twitter Data for Bitcoin Price Prediction: A Methodological Exploration

The paper at hand addresses the burgeoning interest in leveraging social media data for financial predictions, specifically focusing on Bitcoin price forecasting through the analysis of tweets. The premise builds upon the notion that digital sentiments expressed via platforms such as Twitter have a non-negligible influence on cryptocurrency markets. This research attempts to quantify that influence by mining tweets, performing sentiment analysis, and applying various data modeling techniques to predict Bitcoin price movements.

Methodologies Employed

The study employs a comprehensive methodological framework, incorporating data extraction, preprocessing, exploratory data analysis (EDA), sentiment analysis, clustering, regression, and classification to achieve its goal of predicting Bitcoin price volatility based on social media activity.

  1. Data Collection and Preprocessing: A dataset comprising 16 million Bitcoin-related tweets spanning from January 2016 to March 2019 was used. The dataset underwent preprocessing to filter out non-English tweets as well as cleaning tasks such as removing URLs and mentions.
  2. Sentiment Analysis: Sentiment analysis was performed using libraries like textblob and vaderSentiment, which categorized tweets as positive, negative, or neutral. Despite a hefty 90% of the tweets being neutral, the analysis did not establish a notable correlation with Bitcoin price fluctuations.
  3. Clustering: The study explores clustering methodologies to group users based on their tweets' content and engagement metrics (likes, retweets). K-means, hierarchical clustering, and DBSCAN were tested, though computational limitations restricted the exhaustive use of hierarchical and DBSCAN clustering.
  4. Regression Analysis: Several regression models (linear, ridge, lasso) were employed for predicting next-day Bitcoin prices. The study reveals that classical regression models, particularly ridge regression, demonstrated superior performance compared to decision tree-based models and neural networks, with the latter showing weaknesses in generalizing to testing datasets.
  5. Classification: Various classifiers were utilized to predict directional shifts in Bitcoin price (up or down). The Random Forest classifier achieved the highest performance metrics with accuracy at 62% and an F1-score of 75%, outperforming other classifiers like KNN, Logistic Regression, and SVM.

Key Findings and Implications

The paper elucidates several critical insights regarding the interplay between social media sentiments and Bitcoin's market behavior:

  • EDA Results: The analysis of tweet metrics (volume, engagement) underscored that certain patterns and trends in tweet volume correlate with regular work hours and weekly timing, demonstrating peak interactions on Fridays.
  • Sentiment Prevalence: The overrepresentation of neutral sentiments in tweets and the difficulty in correlating sentiments with price movements suggest that market predictions based purely on sentiment analysis may necessitate more sophisticated modeling frameworks or incorporating external data sources.
  • Model Performance: The comparative performance of different regression and classification models emphasizes the importance of model selection. The Random Forest classifier's superior results suggest tree-based ensemble methods may better capture the complexities of financial time series influenced by social behavior.

Theoretical and Practical Considerations

From a theoretical perspective, this study adds depth to the corpus of financial prediction methodologies by integrating sentiment-driven analysis with traditional econometric models. However, the observed neutrality in sentiment analysis results and the varying efficacy among prediction models highlight challenges in isolating the causal impact of social media sentiments on market behavior.

Practically, the research suggests that incorporating social media analytics into trading strategies could enhance decision-making processes. However, further work is required to refine prediction models, potentially incorporating additional features like verified account flags or broader market indicators for improved accuracy.

Future Directions

The research opens avenues for future exploration, particularly in better understanding sentiment dynamics' role in market movements. Expanding the dataset scope and incorporating enhanced feature sets, such as broader economic indicators or cross-referencing with other social media platforms, could yield more robust predictive models. Additionally, diversifying the focus to include various cryptocurrencies may offer deeper insights into the generalizability of these methods across different digital assets.

In conclusion, while results indicate promising links between tweet sentiments and Bitcoin market trends, the inherent volatility and unpredictability demand continued methodological innovation and expansive data integration to truly leverage the predictive power of social media sentiments.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 242 likes about this paper.