A Data-Based Approach to Social Influence Maximization (1109.6886v1)

Published 30 Sep 2011 in cs.DB

Abstract: Influence maximization is the problem of finding a set of users in a social network, such that by targeting this set, one maximizes the expected spread of influence in the network. Most of the literature on this topic has focused exclusively on the social graph, overlooking historical data, i.e., traces of past action propagations. In this paper, we study influence maximization from a novel data-based perspective. In particular, we introduce a new model, which we call credit distribution, that directly leverages available propagation traces to learn how influence flows in the network and uses this to estimate expected influence spread. Our approach also learns the different levels of influenceability of users, and it is time-aware in the sense that it takes the temporal nature of influence into account. We show that influence maximization under the credit distribution model is NP-hard and that the function that defines expected spread under our model is submodular. Based on these, we develop an approximation algorithm for solving the influence maximization problem that at once enjoys high accuracy compared to the standard approach, while being several orders of magnitude faster and more scalable.

Authors (3)

Amit Goyal (16 papers)
Francesco Bonchi (73 papers)
Laks V. S. Lakshmanan (59 papers)

Citations (465)

View on Semantic Scholar

Summary

The paper introduces a novel credit distribution model that utilizes past propagation data to eliminate the need for costly Monte Carlo simulations.
It proves the NP-hardness of influence maximization and shows that the influence spread function is submodular, enabling a (1−1/e)-approximation algorithm.
Empirical results demonstrate that the greedy algorithm with CELF optimization outperforms traditional IC and LT models in both speed and prediction accuracy.

A Data-Based Approach to Social Influence Maximization

The paper "A Data-Based Approach to Social Influence Maximization" by Goyal, Bonchi, and Lakshmanan proposes a novel approach to tackle the challenge of influence maximization in social networks—a key concept with applications in viral marketing, personalized recommendations, and social media analysis. This work diverges from traditional methodologies that rely exclusively on social graph structures, introducing a novel data-driven model that capitalizes on historical propagation data.

Key Contributions

Credit Distribution Model: The authors introduce the credit distribution (CD) model, which leverages traces of past action propagations to estimate influence spread directly. This model eliminates the need to learn separate edge probabilities or perform costly Monte Carlo (MC) simulations that are typical of traditional models like the Independent Cascade (IC) and Linear Threshold (LT) models.
Theoretical Insights: The paper demonstrates that the influence maximization problem under the CD model is NP-hard. Importantly, they establish that the function defining influence spread is submodular, allowing the development of an approximation algorithm with a theoretical guarantee of a (1−1/e)-approximation to the optimal solution.
Empirical Evaluation: Through comprehensive experiments, the paper provides evidence that methods not leveraging real propagation data can result in poor seed set selection and large prediction errors. In contrast, the CD model yields better accuracy and scalability, outperforming IC and LT models in predicting actual spread.
Algorithmic Efficiency: The proposed greedy algorithm, enhanced by a CELF optimization strategy, significantly reduces computation time, making it several orders of magnitude faster than MC simulation-based methods. This efficiency is achieved without sacrificing accuracy, as indicated by real-world datasets.

Implications and Future Directions

This paper's implications are multifaceted. Practically, the CD model offers a scalable solution to influence maximization, applicable to large networks where standard approaches become impractical. Theoretically, this work suggests that incorporating real propagation data yields more reliable outcomes and challenges the prevailing reliance on solely graph-based models.

Speculatively, future developments could explore hybrid models that combine graph-based insights with propagation trace data, potentially enhancing predictive accuracy further. Additionally, extending this model to dynamically evolving networks could provide valuable insights into influence spread in rapidly changing environments, such as emerging social platforms.

Conclusion

Goyal et al. present a compelling case for a data-driven approach to influence maximization. By directly integrating historical propagation data, the CD model not only enhances prediction accuracy but also ensures scalability and efficiency. This model invites a re-evaluation of existing influence maximization strategies and offers a strong foundation for subsequent research in the domain of social network analysis.

PDF Markdown