Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

102 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

6 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Diffusive Logistic Model Towards Predicting Information Diffusion in Online Social Networks (1108.0442v1)

Published 1 Aug 2011 in cs.SI, math.AP, and physics.soc-ph

Abstract: Online social networks have recently become an effective and innovative channel for spreading information and influence among hundreds of millions of end users. Many prior work have carried out empirical studies and proposed diffusion models to understand the information diffusion process in online social networks. However, most of these studies focus on the information diffusion in temporal dimension, that is, how the information propagates over time. Little attempt has been given on understanding information diffusion over both temporal and spatial dimensions. In this paper, we propose a Partial Differential Equation (PDE), specifically, a Diffusive Logistic (DL) equation to model the temporal and spatial characteristics of information diffusion in online social networks. To be more specific, we develop a PDE-based theoretical framework to measure and predict the density of influenced users at a given distance from the original information source after a time period. The density of influenced users over time and distance provides valuable insight on the actual information diffusion process. We present the temporal and spatial patterns in a real dataset collected from Digg social news site, and validate the proposed DL equation in terms of predicting the information diffusion process. Our experiment results show that the DL model is indeed able to characterize and predict the process of information propagation in online social networks. For example, for the most popular news with 24,099 votes in Digg, the average prediction accuracy of DL model over all distances during the first 6 hours is 92.08%. To the best of our knowledge, this paper is the first attempt to use PDE-based model to study the information diffusion process in both temporal and spatial dimensions in online social networks.

PDF Abstract

This paper, "Diffusive Logistic Model Towards Predicting Information Diffusion in Online Social Networks" (Wang et al., 2011 ), presents a novel approach using Partial Differential Equations (PDEs) to model and predict how information spreads through online social networks, considering both time and network distance simultaneously. Previous work often focused solely on the temporal aspect (how many people are influenced over time), but this paper addresses the spatio-temporal diffusion problem: determining the density of influenced users at a specific distance from the source after a certain time.

Core Concept: The Diffusive Logistic (DL) Model

The central idea is to model information spread as two interacting processes:

Growth Process: Information spreading among users who are at the same distance from the source. This is modeled using the standard logistic growth equation, commonly used in population dynamics: $\frac{\partial I}{\partial t} = rI(1-\frac{I}{K})$ . Here, $I$ is the density of influenced users, $r$ is the intrinsic growth rate (how fast influence spreads within the group), and $K$ is the carrying capacity (the maximum possible density).
Diffusion Process: Information spreading randomly between users at different distances from the source. This captures spread beyond direct friend-of-friend links, like discovering content on a front page or via search. This is modeled using Fick's law of diffusion: $d \frac{\partial^2 I}{\partial x^2}$ , where $d$ is the diffusion rate (how fast information travels across distances) and $x$ represents the distance.

Combining these leads to the Diffusive Logistic (DL) equation:

$\frac{\partial I}{\partial t}=d \frac{\partial^2 I}{\partial x^2}+r I(1-\frac{I}{K})$

This PDE describes how the density of influenced users $I(x,t)$ changes over time $t$ and distance $x$ . The model includes:

Initial Condition: $I(x, 1) = \phi(x)$ , representing the observed density distribution at the start time (e.g., $t=1$ hour).
Boundary Conditions: $\frac{ \partial I}{\partial x}(l,t)=\frac{ \partial I}{\partial x}(L,t)=0$ , where $l$ and $L$ are the minimum and maximum distances considered. This is a Neumann boundary condition, meaning no information flows out of the defined distance boundaries (it stays within the network).

The paper proves two key properties of this model:

Unique Property: The model guarantees a unique, positive solution for $I(x,t)$ bounded between 0 and $K$ .
Strictly Increasing Property: If the initial density $\phi(x)$ meets certain conditions, the density $I(x,t)$ will strictly increase over time, aligning with the intuition that influence spreads but doesn't retract.

Defining Distance in Social Networks

Since "distance" isn't inherently spatial in online networks, the paper proposes and evaluates two metrics:

Friendship Hops: The shortest path length (number of friendship links) between the information source (initiator) and another user in the network graph.
Shared Interests: A measure of dissimilarity based on content interaction history (e.g., voted/digged stories). Defined as $d_{a,b} = 1 - \frac{|C_a \cap C_b|}{|C_a \cup C_b|}$ , where $C_a$ and $C_b$ are the sets of content interacted with by users $a$ and $b$ . A lower value means higher shared interest (closer distance).

Implementation Details

Implementing the DL model involves several practical steps:

Data Preparation:
- Identify the source/initiator of an information cascade (e.g., the first user to vote for a story).
- Build the social network graph (friendship links).
- For each user who gets influenced (e.g., votes), record the timestamp.
- Calculate the distance ( $x$ ) from the source to every other user using the chosen metric (friendship hops or shared interests). Pre-calculating shortest paths (e.g., using Breadth-First Search for hops) is necessary. Calculating shared interests requires access to user-item interaction histories.
- Group users by distance $x$ .
- Calculate the density $I(x,t)$ at discrete time points $t$ : (Number of influenced users at distance $x$ by time $t$ ) / (Total number of users at distance $x$ ).
Constructing the Initial Condition $\phi(x)$ :
- The model requires a continuous, twice-differentiable initial function $\phi(x)$ with zero slope at the boundaries ( $l, L$ ).
- Real data provides discrete density values $I(x, t=1)$ only at integer distances $x$ .
- Use cubic spline interpolation on the discrete initial density data points $(x, I(x, t=1))$ to create a smooth, piecewise cubic function $\phi(x)$ . This satisfies the differentiability requirement.
- Manually ensure the slopes at the minimum ( $l$ ) and maximum ( $L$ ) distances considered are zero (e.g., by setting the derivative of the spline to zero at the endpoints or by extending the data slightly with constant values).
- Ensure the condition $d \phi''+r \phi(1-\frac{\phi}{K}) \geq 0$ holds. The paper notes this is often satisfied if $\phi(x)$ is largely convex or if $K$ is large and $d$ is small relative to $r$ .
Parameter Estimation ( $d, r, K$ ):
- $K$ (Carrying Capacity): Can be estimated from historical data or set based on the maximum observed density in the initial phase or similar past cascades. In the paper's Digg experiment, $K=25$ (for hops) and $K=60$ (for interests) were chosen based on observation.
- $d$ (Diffusion Rate): Controls how much the density profile smooths out across distances. This can be tuned empirically. Values like $d=0.01$ (hops) and $d=0.05$ (interests) were used.
- $r$ (Growth Rate): Controls the speed of density increase within a distance group. The paper observed that the rate of increase slows over time. Therefore, they modeled $r$ as a decreasing function of time. Specific functions like $r(t) = 1.4e^{-1.5(t-1)} + 0.25$ (hops) and $r(t)=1.6e^{-(t-1)}+0.1$ (interests) were used, likely fitted to match the observed growth patterns in the Digg data.

Solving the PDE:

The DL equation is a non-linear PDE. It typically requires numerical methods for solving. Common approaches include the Finite Difference Method (FDM) or Finite Element Method (FEM).
Using FDM, you would discretize both time $t$ and distance $x$ , approximate the derivatives ( $\frac{\partial I}{\partial t}$ , $\frac{\partial^2 I}{\partial x^2}$ ) using finite differences, and iteratively compute $I(x, t+\Delta t)$ based on values at time $t$ . An implicit or explicit time-stepping scheme (like Forward Euler, Backward Euler, or Crank-Nicolson) would be chosen.

Pseudocode for Numerical Solution (Conceptual using Forward Euler):

# Discretize distance x into points x_i (i=0 to N)
# Discretize time t into steps t_j (j=0 to M)
# Initialize I[i] = phi(x_i) for all i at t=0

for j from 0 to M-1: # Time steps
    t = t_j
    calculate r(t) # If r depends on time
    for i from 1 to N-1: # Spatial points (excluding boundaries)
        # Approximate second derivative (central difference)
        I_xx = (I[i+1] - 2*I[i] + I[i-1]) / (delta_x)^2
        # Approximate time derivative (forward difference)
        I_t = d * I_xx + r(t) * I[i] * (1 - I[i] / K)
        # Update density for next time step
        I_new[i] = I[i] + delta_t * I_t

    # Handle boundary conditions (Neumann: slope is zero)
    # Example: I_new[0] = I_new[1], I_new[N] = I_new[N-1] (simplest approach)
    # More accurate methods exist for boundary implementation.

    # Update I for the next iteration
    I = I_new

Experimental Validation (Digg Dataset)

The paper validated the DL model using a dataset from Digg (June 2009): 3553 popular news stories, >3M votes, >139k users, and their friendship links.

Observations:
- Density patterns varied significantly between stories.
- Using friendship hops, density didn't always decrease monotonically with distance (e.g., density at hop 3 could be higher than hop 2), supporting the need for a diffusion term ( $d \frac{\partial^2 I}{\partial x^2}$ ) alongside direct propagation.
- Using shared interests, density generally decreased as the interest distance increased, confirming its relevance.
- Density tended to saturate over time (typically within 10-50 hours for popular stories).
Prediction Results:
- The model was initialized using data from the first hour ( $t=1$ ) of a story's spread ( $\phi(x)$ ).
- Predictions were generated for subsequent hours (t=2 to t=6).
- For the most popular story (s1, ~24k votes), using friendship hops, the average prediction accuracy (defined as $1 - \frac{|\text{predicted} - \text{actual}|}{\text{actual}}$ ) over distances 1-6 and time 2-6 hours was 92.81%. Accuracy was very high (98.27%) for direct followers (distance 1).
- Using shared interests for the same story, accuracy was also high for distances 1-4 (91-97%), but dropped significantly for distance 5, suggesting the model might need refinement (e.g., making parameters $d, r, K$ also dependent on distance $x$ ).

Practical Applications and Significance

Predictive Power: The model allows forecasting the spatial reach and density of influence over time, based on early observations. This goes beyond just predicting the final total number of influenced users.
Understanding Diffusion Dynamics: Helps disentangle local growth (within similar distances) from broader diffusion (across distances), offering insights into how different network structures or content types spread.
Potential Uses:
- Marketing: Predict campaign reach across different network segments.
- Public Health/Info Campaigns: Estimate how far and fast information (or misinformation) might spread.
- Platform Design: Understand how features (like recommendation algorithms or front-page promotion) impact spatio-temporal diffusion patterns.

Limitations and Considerations

Parameter Sensitivity: The model's accuracy depends heavily on correctly estimating $d, r, K$ and constructing $\phi(x)$ . The paper used fixed values or simple time-dependent functions; real-world application might require more complex, adaptive parameter estimation.
Computational Cost: Solving PDEs numerically can be computationally intensive, especially for large networks or long time durations.
Distance Metric Choice: The effectiveness depends on choosing an appropriate distance metric for the specific network and type of information.
Network Dynamics: The model assumes a static network structure during the diffusion process, which might not hold for longer timescales.
Homogeneity Assumption: The parameters $d, r, K$ are initially assumed to be uniform across a given distance $x$ . As noted in the future work, making them functions of both $x$ and $t$ could improve accuracy.

In summary, the Diffusive Logistic model provides a valuable theoretical and practical framework for analyzing and predicting information spread in both time and space within online social networks, offering richer insights than purely temporal models. Its implementation requires careful data preparation, construction of the initial state via interpolation, parameter tuning based on observed dynamics, and numerical PDE solving techniques.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Feng Wang (408 papers)
Haiyan Wang (108 papers)
Kuai Xu (4 papers)

Citations (101)

View on Semantic Scholar