- The paper introduces a novel hybrid framework combining a multi-dimensional Hawkes process with a Graph Neural Network to model temporal and structural opinion dynamics on social media.
- It leverages fine-grained sentiment labels and hierarchical comment structures from the VISTA dataset to predict future comment propagation and sentiment diffusion.
- Baseline experiments demonstrate improved sentiment and structural prediction accuracy over traditional models, despite limitations related to Weibo-focused data.
This paper introduces a framework called "Rhythm of Opinion" for analyzing how public opinions spread and evolve on social media, addressing limitations in existing models that often overlook hierarchical comment structures, temporal dynamics, and cross-topic interactions.
Problem:
Traditional methods for modeling opinion dynamics struggle with the complexity of modern social media discussions. They often fail to capture:
- The hierarchical nature of comments (replies to replies).
- The temporal dynamics and mutual influence between comments over time.
- Multi-dimensional aspects like sentiment propagation.
- Interactions between different trending topics happening concurrently.
Furthermore, existing datasets are often inadequate, lacking hierarchical structure, sufficient time coverage, fine-grained annotations, or information about cross-topic influences.
Proposed Solution: Hawkes-Graph Framework
The core idea is to combine a multi-dimensional Hawkes process with a Graph Neural Network (GNN):
- Multi-dimensional Hawkes Process:
- Models the temporal arrival rate (intensity) of comments.
- Each dimension ω represents a unique combination of hierarchy level (l, up to 3 levels) and sentiment category (c, 11 categories).
- The intensity λω​(t) for a specific comment type depends on a baseline rate μω​ and the influence (excitation αω,ω′​) from past comments of all types (ω′), modeled with an exponential decay kernel ϕω,ω′​.
- This captures how comments of a certain type (e.g., angry reply at level 2) can trigger more comments, potentially of different types (e.g., neutral reply at level 3).
- Parameters are estimated using Maximum Likelihood Estimation (MLE).
- Predicts the expected number of comments for each (hierarchy, sentiment) type in a future time window.
- Graph Neural Network (GNN):
- Models the structural evolution and sentiment diffusion within the comment hierarchy.
- Constructs an opinion propagation graph where nodes are comments and edges represent parent-child reply relationships.
- Node features: Include the intensity λω​(t) from the Hawkes process and a derived sentiment probability distribution qc​(v).
- Edge features: Include the time difference between comments and the Hawkes excitation strength αω,ω′​ between the corresponding comment types.
- Uses a message-passing mechanism to update node embeddings based on their own features and aggregated information from neighbors, weighted by edge features.
- Performs two main tasks:
- Node Classification: Predicts the sentiment label for each comment node.
- Edge Prediction (Implicit): Learns structural relationships through the graph structure loss.
- Optimized using a combined loss function including sentiment prediction loss (cross-entropy between predicted sentiment and Hawkes-derived distribution) and graph structure loss (difference between predicted and true edge features/structure).
VISTA Dataset:
To facilitate this research, the authors introduce the VISTA dataset, collected from Weibo (2024-early 2025):
- Content: 159 trending topics, ~47k posts, ~327k L2 comments, ~29k L3 comments.
- Structure: Captures tree-like hierarchical comment structures up to 3 levels.
- Annotation: Comments annotated with 11 fine-grained sentiment labels (e.g., Angry, Anxious, Happy, Excited) using GLM-4-plus and manual verification (Cohen's Kappa = 0.85).
- Scope: Covers diverse domains (politics, entertainment, etc.) and aims to capture the full lifecycle of topics and potential cross-topic influences.
Task & Evaluation:
The goal is to predict future comments (temporal arrival and structural connections) based on past observations. The model is evaluated using:
- Sentiment Prediction Accuracy (SA): Accuracy of predicting comment sentiment labels.
- Structural Consistency Prediction Accuracy (SCA): Accuracy of predicting the set of child comments for each parent node.
Baseline experiments on VISTA show improved performance with larger training data proportions (15% vs 20% vs 25%).
Conclusion & Limitations:
The paper proposes an interpretable Hawkes-Graph framework and a rich dataset (VISTA) for studying complex opinion dynamics. It effectively models temporal, structural, hierarchical, and sentiment aspects. Limitations include the dataset being restricted to Weibo, potential data correlations not explicitly modeled (e.g., fan communities), assumptions about data noise, and the inherent trade-off between model interpretability and potential predictive power compared to black-box models. The code is made available for reproducibility.