Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency (1404.0900v2)

Published 3 Apr 2014 in cs.SI and cs.DB

Abstract: Given a social network G and a constant k, the influence maximization problem asks for k nodes in G that (directly and indirectly) influence the largest number of nodes under a pre-defined diffusion model. This problem finds important applications in viral marketing, and has been extensively studied in the literature. Existing algorithms for influence maximization, however, either trade approximation guarantees for practical efficiency, or vice versa. In particular, among the algorithms that achieve constant factor approximations under the prominent independent cascade (IC) model or linear threshold (LT) model, none can handle a million-node graph without incurring prohibitive overheads. This paper presents TIM, an algorithm that aims to bridge the theory and practice in influence maximization. On the theory side, we show that TIM runs in O((k+\ell) (n+m) \log n / \epsilon²⁾ expected time and returns a (1-1/e-\epsilon)-approximate solution with at least 1 - n^{-\ell} probability. The time complexity of TIM is near-optimal under the IC model, as it is only a \log n factor larger than the \Omega(m + n) lower-bound established in previous work (for fixed k, \ell, and \epsilon). Moreover, TIM supports the triggering model, which is a general diffusion model that includes both IC and LT as special cases. On the practice side, TIM incorporates novel heuristics that significantly improve its empirical efficiency without compromising its asymptotic performance. We experimentally evaluate TIM with the largest datasets ever tested in the literature, and show that it outperforms the state-of-the-art solutions (with approximation guarantees) by up to four orders of magnitude in terms of running time. In particular, when k = 50, \epsilon = 0.2, and \ell = 1, TIM requires less than one hour on a commodity machine to process a network with 41.6 million nodes and 1.4 billion edges.

PDF Abstract

Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency

The paper "Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency" by Youze Tang, Xiaokui Xiao, and Yanchen Shi presents TIM, an advanced algorithm designed to optimize the influence maximization problem in large social networks, providing near-optimal time complexity alongside practical efficiency.

Background and Motivation

Influence maximization is a critical problem in social network analysis and viral marketing, where the goal is to identify a fixed number of nodes in a network that can maximize the spread of influence. Formally, given a social network $G$ and a constant $k$ , the goal is to find $k$ nodes that influence the largest number of other nodes under a predefined diffusion model, such as the Independent Cascade (IC) model or the Linear Threshold (LT) model. Traditional methods, such as the Greedy algorithm proposed by Kempe et al., while effective in terms of approximation guarantees, are computationally prohibitive, especially for large networks.

Contribution

The authors propose TIM, a novel approach that strikes a balance between theoretical rigor and practical efficiency. TIM's key contributions include:

Near-Optimal Time Complexity: TIM achieves an expected time complexity of $O((k + \ell)(n + m)\log n/\epsilon)$ and provides a $(1 - 1/e - \epsilon)$ -approximate solution with at least $1 - n^{- \ell}$ probability.
Support for Triggering Models: TIM is versatile and supports the triggering model, encompassing both the IC and LT models as special cases.
Practical Efficiency: By incorporating several heuristic optimizations, TIM achieves substantial empirical efficiency, outperforming existing algorithms by up to four orders of magnitude in running time.

Algorithm Overview

TIM employs a two-phase approach:

Parameter Estimation: This phase furnishes a lower-bound estimate of the maximum expected spread among all size- $k$ node sets, subsequently used to derive a parameter $\theta$ .
Node Selection: A fixed number $\theta$ of random Reverse Reachable (RR) sets are sampled from $G$ . The algorithm then solves a maximum coverage problem to select $k$ nodes that cover the largest number of RR sets.

Theoretical Guarantees

TIM leverages the theoretical foundation provided by Borgs et al. [3] to ensure near-optimal time complexity. The expected spread estimated using RR sets is a reliable approximation of actual influence spread, supported by the Chernoff bounds.

Practical Implications

TIM and its enhanced variant TIM+, which includes intermediate heuristic steps for improved parameter estimation, demonstrate significant improvements in both theoretical and practical domains:

Scalability: TIM is capable of efficiently processing large-scale networks with billions of edges using commodity hardware. For instance, for $k = 50$ , $\epsilon = 0.2$ , and $\ell = 1$ , TIM processes a network with 41.6 million nodes and 1.4 billion edges in under one hour.
Empirical Performance: Experiments reveal that TIM substantially reduces running time compared to state-of-the-art solutions like RIS and CELF++ while maintaining or improving the quality of the solution. This efficient performance extends to various network structures and sizes, as confirmed by experiments on datasets like NetHEPT, Epinions, DBLP, LiveJournal, and Twitter.

Future Directions

The TIM algorithm paves the way for further research along several lines:

Distributed Computing: Extending TIM to distributed computing frameworks can facilitate handling even larger datasets that exceed the memory capacity of a single machine.
Extended Influence Models: Adaptations of TIM to other influence propagation models or competitive influence maximization scenarios could broaden its applicability.

Conclusion

Overall, TIM represents a significant advancement in the field of influence maximization, achieving a delicate balance between rigorous theoretical guarantees and compelling practical efficiency. Its ability to handle extremely large graphs while providing strong approximation guarantees makes it a valuable tool for both researchers and practitioners in network analysis and viral marketing.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Youze Tang (2 papers)
Xiaokui Xiao (90 papers)
Yanchen Shi (1 paper)

Citations (799)

View on Semantic Scholar

Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency (1404.0900v2)