Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diffusive Logistic Model Towards Predicting Information Diffusion in Online Social Networks (1108.0442v1)

Published 1 Aug 2011 in cs.SI, math.AP, and physics.soc-ph

Abstract: Online social networks have recently become an effective and innovative channel for spreading information and influence among hundreds of millions of end users. Many prior work have carried out empirical studies and proposed diffusion models to understand the information diffusion process in online social networks. However, most of these studies focus on the information diffusion in temporal dimension, that is, how the information propagates over time. Little attempt has been given on understanding information diffusion over both temporal and spatial dimensions. In this paper, we propose a Partial Differential Equation (PDE), specifically, a Diffusive Logistic (DL) equation to model the temporal and spatial characteristics of information diffusion in online social networks. To be more specific, we develop a PDE-based theoretical framework to measure and predict the density of influenced users at a given distance from the original information source after a time period. The density of influenced users over time and distance provides valuable insight on the actual information diffusion process. We present the temporal and spatial patterns in a real dataset collected from Digg social news site, and validate the proposed DL equation in terms of predicting the information diffusion process. Our experiment results show that the DL model is indeed able to characterize and predict the process of information propagation in online social networks. For example, for the most popular news with 24,099 votes in Digg, the average prediction accuracy of DL model over all distances during the first 6 hours is 92.08%. To the best of our knowledge, this paper is the first attempt to use PDE-based model to study the information diffusion process in both temporal and spatial dimensions in online social networks.

This paper, "Diffusive Logistic Model Towards Predicting Information Diffusion in Online Social Networks" (Wang et al., 2011 ), presents a novel approach using Partial Differential Equations (PDEs) to model and predict how information spreads through online social networks, considering both time and network distance simultaneously. Previous work often focused solely on the temporal aspect (how many people are influenced over time), but this paper addresses the spatio-temporal diffusion problem: determining the density of influenced users at a specific distance from the source after a certain time.

Core Concept: The Diffusive Logistic (DL) Model

The central idea is to model information spread as two interacting processes:

  1. Growth Process: Information spreading among users who are at the same distance from the source. This is modeled using the standard logistic growth equation, commonly used in population dynamics: It=rI(1IK)\frac{\partial I}{\partial t} = rI(1-\frac{I}{K}). Here, II is the density of influenced users, rr is the intrinsic growth rate (how fast influence spreads within the group), and KK is the carrying capacity (the maximum possible density).
  2. Diffusion Process: Information spreading randomly between users at different distances from the source. This captures spread beyond direct friend-of-friend links, like discovering content on a front page or via search. This is modeled using Fick's law of diffusion: d2Ix2d \frac{\partial^2 I}{\partial x^2}, where dd is the diffusion rate (how fast information travels across distances) and xx represents the distance.

Combining these leads to the Diffusive Logistic (DL) equation:

It=d2Ix2+rI(1IK)\frac{\partial I}{\partial t}=d \frac{\partial^2 I}{\partial x^2}+r I(1-\frac{I}{K})

This PDE describes how the density of influenced users I(x,t)I(x,t) changes over time tt and distance xx. The model includes:

  • Initial Condition: I(x,1)=ϕ(x)I(x, 1) = \phi(x), representing the observed density distribution at the start time (e.g., t=1t=1 hour).
  • Boundary Conditions: Ix(l,t)=Ix(L,t)=0\frac{ \partial I}{\partial x}(l,t)=\frac{ \partial I}{\partial x}(L,t)=0, where ll and LL are the minimum and maximum distances considered. This is a Neumann boundary condition, meaning no information flows out of the defined distance boundaries (it stays within the network).

The paper proves two key properties of this model:

  • Unique Property: The model guarantees a unique, positive solution for I(x,t)I(x,t) bounded between 0 and KK.
  • Strictly Increasing Property: If the initial density ϕ(x)\phi(x) meets certain conditions, the density I(x,t)I(x,t) will strictly increase over time, aligning with the intuition that influence spreads but doesn't retract.

Defining Distance in Social Networks

Since "distance" isn't inherently spatial in online networks, the paper proposes and evaluates two metrics:

  1. Friendship Hops: The shortest path length (number of friendship links) between the information source (initiator) and another user in the network graph.
  2. Shared Interests: A measure of dissimilarity based on content interaction history (e.g., voted/digged stories). Defined as da,b=1CaCbCaCbd_{a,b} = 1 - \frac{|C_a \cap C_b|}{|C_a \cup C_b|}, where CaC_a and CbC_b are the sets of content interacted with by users aa and bb. A lower value means higher shared interest (closer distance).

Implementation Details

Implementing the DL model involves several practical steps:

  1. Data Preparation:
    • Identify the source/initiator of an information cascade (e.g., the first user to vote for a story).
    • Build the social network graph (friendship links).
    • For each user who gets influenced (e.g., votes), record the timestamp.
    • Calculate the distance (xx) from the source to every other user using the chosen metric (friendship hops or shared interests). Pre-calculating shortest paths (e.g., using Breadth-First Search for hops) is necessary. Calculating shared interests requires access to user-item interaction histories.
    • Group users by distance xx.
    • Calculate the density I(x,t)I(x,t) at discrete time points tt: (Number of influenced users at distance xx by time tt) / (Total number of users at distance xx).
  2. Constructing the Initial Condition ϕ(x)\phi(x):
    • The model requires a continuous, twice-differentiable initial function ϕ(x)\phi(x) with zero slope at the boundaries (l,Ll, L).
    • Real data provides discrete density values I(x,t=1)I(x, t=1) only at integer distances xx.
    • Use cubic spline interpolation on the discrete initial density data points (x,I(x,t=1))(x, I(x, t=1)) to create a smooth, piecewise cubic function ϕ(x)\phi(x). This satisfies the differentiability requirement.
    • Manually ensure the slopes at the minimum (ll) and maximum (LL) distances considered are zero (e.g., by setting the derivative of the spline to zero at the endpoints or by extending the data slightly with constant values).
    • Ensure the condition dϕ+rϕ(1ϕK)0d \phi''+r \phi(1-\frac{\phi}{K}) \geq 0 holds. The paper notes this is often satisfied if ϕ(x)\phi(x) is largely convex or if KK is large and dd is small relative to rr.
  3. Parameter Estimation (d,r,Kd, r, K):
    • KK (Carrying Capacity): Can be estimated from historical data or set based on the maximum observed density in the initial phase or similar past cascades. In the paper's Digg experiment, K=25K=25 (for hops) and K=60K=60 (for interests) were chosen based on observation.
    • dd (Diffusion Rate): Controls how much the density profile smooths out across distances. This can be tuned empirically. Values like d=0.01d=0.01 (hops) and d=0.05d=0.05 (interests) were used.
    • rr (Growth Rate): Controls the speed of density increase within a distance group. The paper observed that the rate of increase slows over time. Therefore, they modeled rr as a decreasing function of time. Specific functions like r(t)=1.4e1.5(t1)+0.25r(t) = 1.4e^{-1.5(t-1)} + 0.25 (hops) and r(t)=1.6e(t1)+0.1r(t)=1.6e^{-(t-1)}+0.1 (interests) were used, likely fitted to match the observed growth patterns in the Digg data.
  4. Solving the PDE:
    • The DL equation is a non-linear PDE. It typically requires numerical methods for solving. Common approaches include the Finite Difference Method (FDM) or Finite Element Method (FEM).
    • Using FDM, you would discretize both time tt and distance xx, approximate the derivatives (It\frac{\partial I}{\partial t}, 2Ix2\frac{\partial^2 I}{\partial x^2}) using finite differences, and iteratively compute I(x,t+Δt)I(x, t+\Delta t) based on values at time tt. An implicit or explicit time-stepping scheme (like Forward Euler, Backward Euler, or Crank-Nicolson) would be chosen.

    Pseudocode for Numerical Solution (Conceptual using Forward Euler):

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    
    # Discretize distance x into points x_i (i=0 to N)
    # Discretize time t into steps t_j (j=0 to M)
    # Initialize I[i] = phi(x_i) for all i at t=0
    
    for j from 0 to M-1: # Time steps
        t = t_j
        calculate r(t) # If r depends on time
        for i from 1 to N-1: # Spatial points (excluding boundaries)
            # Approximate second derivative (central difference)
            I_xx = (I[i+1] - 2*I[i] + I[i-1]) / (delta_x)^2
            # Approximate time derivative (forward difference)
            I_t = d * I_xx + r(t) * I[i] * (1 - I[i] / K)
            # Update density for next time step
            I_new[i] = I[i] + delta_t * I_t
    
        # Handle boundary conditions (Neumann: slope is zero)
        # Example: I_new[0] = I_new[1], I_new[N] = I_new[N-1] (simplest approach)
        # More accurate methods exist for boundary implementation.
    
        # Update I for the next iteration
        I = I_new

Experimental Validation (Digg Dataset)

The paper validated the DL model using a dataset from Digg (June 2009): 3553 popular news stories, >3M votes, >139k users, and their friendship links.

  • Observations:

    • Density patterns varied significantly between stories.
    • Using friendship hops, density didn't always decrease monotonically with distance (e.g., density at hop 3 could be higher than hop 2), supporting the need for a diffusion term (d2Ix2d \frac{\partial^2 I}{\partial x^2}) alongside direct propagation.
    • Using shared interests, density generally decreased as the interest distance increased, confirming its relevance.
    • Density tended to saturate over time (typically within 10-50 hours for popular stories).
  • Prediction Results:
    • The model was initialized using data from the first hour (t=1t=1) of a story's spread (ϕ(x)\phi(x)).
    • Predictions were generated for subsequent hours (t=2 to t=6).
    • For the most popular story (s1, ~24k votes), using friendship hops, the average prediction accuracy (defined as 1predictedactualactual1 - \frac{|\text{predicted} - \text{actual}|}{\text{actual}}) over distances 1-6 and time 2-6 hours was 92.81%. Accuracy was very high (98.27%) for direct followers (distance 1).
    • Using shared interests for the same story, accuracy was also high for distances 1-4 (91-97%), but dropped significantly for distance 5, suggesting the model might need refinement (e.g., making parameters d,r,Kd, r, K also dependent on distance xx).

Practical Applications and Significance

  • Predictive Power: The model allows forecasting the spatial reach and density of influence over time, based on early observations. This goes beyond just predicting the final total number of influenced users.
  • Understanding Diffusion Dynamics: Helps disentangle local growth (within similar distances) from broader diffusion (across distances), offering insights into how different network structures or content types spread.
  • Potential Uses:
    • Marketing: Predict campaign reach across different network segments.
    • Public Health/Info Campaigns: Estimate how far and fast information (or misinformation) might spread.
    • Platform Design: Understand how features (like recommendation algorithms or front-page promotion) impact spatio-temporal diffusion patterns.

Limitations and Considerations

  • Parameter Sensitivity: The model's accuracy depends heavily on correctly estimating d,r,Kd, r, K and constructing ϕ(x)\phi(x). The paper used fixed values or simple time-dependent functions; real-world application might require more complex, adaptive parameter estimation.
  • Computational Cost: Solving PDEs numerically can be computationally intensive, especially for large networks or long time durations.
  • Distance Metric Choice: The effectiveness depends on choosing an appropriate distance metric for the specific network and type of information.
  • Network Dynamics: The model assumes a static network structure during the diffusion process, which might not hold for longer timescales.
  • Homogeneity Assumption: The parameters d,r,Kd, r, K are initially assumed to be uniform across a given distance xx. As noted in the future work, making them functions of both xx and tt could improve accuracy.

In summary, the Diffusive Logistic model provides a valuable theoretical and practical framework for analyzing and predicting information spread in both time and space within online social networks, offering richer insights than purely temporal models. Its implementation requires careful data preparation, construction of the initial state via interpolation, parameter tuning based on observed dynamics, and numerical PDE solving techniques.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Feng Wang (408 papers)
  2. Haiyan Wang (108 papers)
  3. Kuai Xu (4 papers)
Citations (101)