Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity (1506.02594v1)

Published 8 Jun 2015 in cs.SI, physics.soc-ph, and stat.AP

Abstract: Social networking websites allow users to create and share content. Big information cascades of post resharing can form as users of these sites reshare others' posts with their friends and followers. One of the central challenges in understanding such cascading behaviors is in forecasting information outbreaks, where a single post becomes widely popular by being reshared by many users. In this paper, we focus on predicting the final number of reshares of a given post. We build on the theory of self-exciting point processes to develop a statistical model that allows us to make accurate predictions. Our model requires no training or expensive feature engineering. It results in a simple and efficiently computable formula that allows us to answer questions, in real-time, such as: Given a post's resharing history so far, what is our current estimate of its final number of reshares? Is the post resharing cascade past the initial stage of explosive growth? And, which posts will be the most reshared in the future? We validate our model using one month of complete Twitter data and demonstrate a strong improvement in predictive accuracy over existing approaches. Our model gives only 15% relative error in predicting final size of an average information cascade after observing it for just one hour.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Qingyuan Zhao (43 papers)
  2. Murat A. Erdogdu (45 papers)
  3. Hera Y. He (2 papers)
  4. Anand Rajaraman (2 papers)
  5. Jure Leskovec (233 papers)
Citations (628)

Summary

Prediction of Information Cascades on Social Networks: A Self-Exciting Point Process Approach

The paper presents an innovative statistical model designed to predict the final size of information cascades on social media platforms such as Twitter. The authors propose a self-exciting point process model, named SEISMIC (Self-Exciting Model of Information Cascades), which aims to make accurate predictions about the spread of online posts with minimal computational expense and without the need for extensive feature engineering.

Key Contributions

SEISMIC builds upon the theory of self-exciting point processes to model the reshare dynamics of social media content. This approach accounts for the "rich-get-richer" phenomenon, where initially popular posts are more likely to gain further traction. The model functions by estimating the infectiousness of a post, which represents the probability of it being reshared over time. SEISMIC is capable of real-time application, providing predictions based on the resharing history shortly after a post is shared.

Methodology

  1. Memory Kernel and Human Reaction Time: The model incorporates a memory kernel to simulate human reaction times, assuming a power-law distribution that becomes constant after a short period. This accounts for the delay between a user seeing and resharing a post.
  2. Infectiousness Estimation: Infectiousness is dynamically estimated using a non-parametric approach. This estimation adjusts as the content ages, acknowledging changes in its likelihood to be reshared over time.
  3. Prediction and Identification of Cascade States: SEISMIC can forecast whether a cascade is in a supercritical (explosive growth) or subcritical (dying out) state. This is essential in determining the predictability of a cascade's ultimate size.

Results

The model demonstrates a significant improvement over traditional methods, achieving a relative error of only 15% in predicting cascade size after observing one hour of resharing activity. It offers a 30% better accuracy compared to state-of-the-art methodologies, is computationally efficient, and scales linearly with the number of reshares observed. This makes SEISMIC particularly suitable for real-time applications across large datasets.

Implications

Practically, SEISMIC has far-reaching implications for content ranking, trend forecasting, and understanding the spread of information in social networks. By providing accurate predictions quickly, it enables platforms like Twitter to enhance content delivery and ranking algorithms. Theoretically, this model provides insights into the behaviors underlying information spreads and the potential to tailor these insights to different types of networks or content.

Future Directions

The authors suggest that SEISMIC could be extended with network structure knowledge to refine predictions further. Additionally, integrating content-based features or user temporal behaviors could offer a richer context for infectiousness estimation. The framework presents potential avenues for research into the infusion of other datasets to improve and adapt the model for diverse network configurations.

In conclusion, SEISMIC offers a robust, theoretically grounded, and computationally efficient approach to modeling and predicting information cascades on social networks, opening new opportunities for practitioners and researchers to refine and deploy real-time predictive analytics at scale.