StaticGreedy: solving the scalability-accuracy dilemma in influence maximization (1212.4779v3)

Published 19 Dec 2012 in cs.SI, cs.DS, and physics.soc-ph

Abstract: Influence maximization, defined as a problem of finding a set of seed nodes to trigger a maximized spread of influence, is crucial to viral marketing on social networks. For practical viral marketing on large scale social networks, it is required that influence maximization algorithms should have both guaranteed accuracy and high scalability. However, existing algorithms suffer a scalability-accuracy dilemma: conventional greedy algorithms guarantee the accuracy with expensive computation, while the scalable heuristic algorithms suffer from unstable accuracy. In this paper, we focus on solving this scalability-accuracy dilemma. We point out that the essential reason of the dilemma is the surprising fact that the submodularity, a key requirement of the objective function for a greedy algorithm to approximate the optimum, is not guaranteed in all conventional greedy algorithms in the literature of influence maximization. Therefore a greedy algorithm has to afford a huge number of Monte Carlo simulations to reduce the pain caused by unguaranteed submodularity. Motivated by this critical finding, we propose a static greedy algorithm, named StaticGreedy, to strictly guarantee the submodularity of influence spread function during the seed selection process. The proposed algorithm makes the computational expense dramatically reduced by two orders of magnitude without loss of accuracy. Moreover, we propose a dynamical update strategy which can speed up the StaticGreedy algorithm by 2-7 times on large scale social networks.

Authors (5)

Suqi Cheng (17 papers)
Huawei Shen (119 papers)
Junming Huang (24 papers)
Guoqing Zhang (44 papers)
Xueqi Cheng (274 papers)

Citations (199)

View on Semantic Scholar

Summary

Solving the Scalability-Accuracy Dilemma in Influence Maximization: StaticGreedy

The paper "StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization" addresses a significant challenge in the context of viral marketing on social networks: the balance between scalability and accuracy in influence maximization algorithms. Influence maximization is a critical problem that involves identifying a set of seed nodes within a network to maximize subsequent influence spread, a task that is closely linked to viral marketing strategies. Existing solutions face a dilemma: conventional greedy algorithms, although accurate, are computationally expensive and not scalable for large networks; heuristic algorithms, while scalable, compromise on accuracy.

Core Proposition

The authors identify the root cause of this dilemma as the lack of a strict guarantee of submodularity—a property essential for the effectiveness of greedy algorithms—in existing methods. Submodularity ensures that the marginal gain from adding a node to a set doesn't increase as the set grows. The lack of this property implies that multiple Monte Carlo simulations are needed to approximate spread, increasing computational expense significantly.

To address this issue, the paper introduces StaticGreedy, a novel algorithm that guarantees submodularity by reusing static Monte Carlo simulations throughout the seed selection process. This approach dramatically reduces the computational cost by two orders of magnitude without sacrificing accuracy. The critical advancement of StaticGreedy lies in its strategic use of a static collection of Monte Carlo-generated network snapshots to accurately estimate influence spread.

Contributions and Implementation

The paper makes the following significant contributions:

It demonstrates that failing to guarantee submodularity leads to excessive computational requirements in existing greedy algorithms.
It proposes StaticGreedy, an algorithm that reuses Monte Carlo snapshot results, ensuring submodularity and monotonicity properties while significantly reducing the computational burden.
It introduces a dynamic update strategy to further optimize the StaticGreedy process, achieving speed-ups of 2-7 times on large networks.

The StaticGreedy algorithm operates in two stages: generating a fixed number of static snapshots of the network and employing a greedy strategy to iteratively select seed nodes using these snapshots. This ensures consistent influence spread estimations across iterations, maintaining submodularity and allowing for more efficient computations.

Experimental Evaluation

The effectiveness and efficiency of StaticGreedy are evaluated against several benchmarks across multiple datasets, varying from moderately sized networks like NetHEPT and NetPHY to larger networks such as DBLP and Douban. The results show that StaticGreedy consistently equals or surpasses conventional greedy algorithms in accuracy while reducing computational time significantly. This is demonstrated under both Uniform Independent Cascade (UIC) and Weighted Independent Cascade (WIC) models, showcasing StaticGreedy’s robustness.

Implications and Future Work

The implications of this research are profound for both theoretical development and practical applications in influence maximization. By alleviating a major bottleneck through guaranteed submodular maximization, StaticGreedy enables the application of influence maximization strategies on large-scale networks where traditional methods are infeasible.

The paper also opens avenues for further research, including determining optimal numbers of Monte Carlo simulations for various network structures, adapting the algorithm to different diffusion models, and implementing the StaticGreedy algorithm in parallel computing environments to enhance scalability further.

In summary, the StaticGreedy algorithm represents a noteworthy advancement in the field of influence maximization, offering a scalable and accurate alternative to existing methods and paving the way for more efficient viral marketing strategies on vast social networks.

PDF Markdown