Inferring Networks of Diffusion and Influence (1006.0234v3)

Published 1 Jun 2010 in cs.DS, cs.SI, physics.soc-ph, and stat.ML

Abstract: Information diffusion and virus propagation are fundamental processes taking place in networks. While it is often possible to directly observe when nodes become infected with a virus or adopt the information, observing individual transmissions (i.e., who infects whom, or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and finds provably near-optimal networks. We demonstrate the effectiveness of our approach by tracing information diffusion in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news for the top 1,000 media sites and blogs tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.

PDF Abstract

Inferring Networks of Diffusion and Influence

The paper of diffusion and influence processes in networks addresses a fundamental phenomenon across various applications such as technological innovation diffusion, viral marketing, and the spread of news or diseases. The paper "Inferring Networks of Diffusion and Influence," authored by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Krause, contributes a method to trace paths of diffusion and to infer unseen propagation networks. The ambition is to deduce the optimal network that rationalizes observed infection or adoption times.

Methodology Outline

The core challenge arises from the typical unobservability of individual transmissions between nodes, despite knowing when nodes become infected or adopt information. Therefore, to address these challenges, the paper proposes a generative probabilistic model for network diffusion and an efficient approximation algorithm, termed NetInf, to infer near-optimal diffusion networks.

Generative Model for Cascade Diffusion

The authors employ a probabilistic model inspired by the Independent Cascade Model, extending it to incorporate temporal information. They assume that infections propagate through directed edges in a static network, but the exact paths remain unobserved. The inferred network is formulated to explain the observed infection times through optimization. Given that this problem is NP-hard, the authors propose an approximation.

Cascade Transmission Model: The model assumes the contagion can spread from one node to another based on a probability that depends on their infection times. They adopt both exponential and power-law distributions for the incubation times.
Cascade Likelihood: The likelihood of a certain cascade spreading in a network is computed by considering all potential propagation trees. Since evaluating this directly is computationally prohibitive, the algorithm approximates it by seeking the most probable propagation tree.

Approximation Algorithm - NetInf

Recognizing the impracticality of exact optimization, NetInf leverages the submodularity of the objective function. This property allows the use of a greedy algorithm that guarantees a solution within a constant factor, specifically $1 - 1/e$, of the optimum.

Incremental Edge Addition: NetInf incrementally adds the edge that optimally increases the likelihood of the observed cascades, ensuring tractability and efficiency.
Efficiency Enhancements: The algorithm is accelerated using lazy evaluations and localized updates, making it scalable to large datasets.

Experimental Validation

The authors validate NetInf on both synthetic and real datasets:

Synthetic Networks: They employ networks generated via the Forest Fire and Kronecker Graph models with varied structures (random, hierarchical, core-periphery) to simulate cascades. The experiments demonstrate NetInf's superior accuracy, achieving high break-even points and area under the curve (AUC) metrics compared to baseline heuristics. The results emphasize NetInf's robustness across different network topologies and infection models.
Real Datasets: They apply NetInf to a dataset encompassing 172 million blog posts and news articles, employing both hyperlink-based and phrase-based (MemeTracker) cascades. The inferred networks align well with realistic propagation dynamics, highlighting predominant pathways from mainstream media to blogs and illustrating the more rapid spread among media outlets compared to blogs.

Implications and Future Directions

The inferred networks provide critical insights into the structure and function of information dissemination on the web, revealing a core-periphery organization with heavily influential nodes and demonstrating efficient identification of influential dissemination paths. Practically, this means that targeted interventions for information spreading or containment strategies (e.g., in epidemiology) could be more effectively designed.

Theoretically, the work stimulates further avenues for improving network inference approaches, particularly in:

Enhancing the Generative Model: Incorporating richer node and edge attributes could improve the model's accuracy in real-world scenarios.
Dynamic Networks: Adaptations for dealing with temporal fluctuations in network structure.
Broader Applications: Extensions to biological, neurological, and social systems where diffusion processes are critical.

In conclusion, this research offers an advanced method for network inference from temporal diffusion data, providing applicable insights for both empirical and theoretical studies across various fields. Future advancements could optimize model precision and broaden applicability.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Manuel Gomez-Rodriguez (40 papers)
Jure Leskovec (233 papers)
Andreas Krause (269 papers)

Citations (1,144)

View on Semantic Scholar

Inferring Networks of Diffusion and Influence (1006.0234v3)