Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fast and Space-Efficient Parallel Algorithms for Influence Maximization (2311.07554v2)

Published 13 Nov 2023 in cs.DS and cs.DC

Abstract: Influence Maximization (IM) is a crucial problem in data science. The goal is to find a fixed-size set of highly-influential seed vertices on a network to maximize the influence spread along the edges. While IM is NP-hard on commonly-used diffusion models, a greedy algorithm can achieve $(1-1/e)$-approximation, repeatedly selecting the vertex with the highest marginal gain in influence as the seed. Due to theoretical guarantees, rich literature focuses on improving the performance of the greedy algorithm. To estimate the marginal gain, existing work either runs Monte Carlo (MC) simulations of influence spread or pre-stores hundreds of sketches (usually per-vertex information). However, these approaches can be inefficient in time (MC simulation) or space (storing sketches), preventing the ideas from scaling to today's large-scale graphs. This paper significantly improves the scalability of IM using two key techniques. The first is a sketch-compression technique for the independent cascading model on undirected graphs. It allows combining the simulation and sketching approaches to achieve a time-space tradeoff. The second technique includes new data structures for parallel seed selection. Using our new approaches, we implemented PaC-IM: Parallel and Compressed IM. We compare PaC-IM with state-of-the-art parallel IM systems on a 96-core machine with 1.5TB memory. PaC-IM can process large-scale graphs with up to 900M vertices and 74B edges in about 2 hours. On average across all tested graphs, our uncompressed version is 5--18$\times$ faster and about 1.4$\times$ more space-efficient than existing parallel IM systems. Using compression further saves 3.8$\times$ space with only 70% overhead in time on average.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (90)
  1. 2010. OpenStreetMap © OpenStreetMap contributors. https://www.openstreetmap.org/.
  2. 2023. https://github.com/ucrparlay/Influence-Maximization.
  3. Debunking the myths of influence maximization: An in-depth benchmarking study. In ACM SIGMOD International Conference on Management of Data (SIGMOD). 651–666.
  4. Thread Scheduling for Multiprogrammed Multiprocessors. Theory of Computing Systems (TOCS) 34, 2 (01 Apr 2001).
  5. Flash in a DBMS: Where and How? IEEE Data Engineering Bulletin 33, 4 (2010), 28–34. Special issue on data management using modern storage hardware.
  6. Group formation in large social networks: membership, growth, and evolution. In ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 44–54.
  7. A survey on influence maximization in a social network. Knowledge and Information Systems 62, 9 (2020), 3417–3455.
  8. Implicit Decomposition for Write-Efficient Connectivity Algorithms. In IEEE International Parallel and Distributed Processing Symposium (IPDPS).
  9. Efficient algorithms for budgeted influence maximization on massive social networks. Proceedings of the VLDB Endowment 13, 9 (2020), 1498–1510.
  10. A bulk-parallel priority queue in external memory with STXXL. In International Symposium on Experimental Algorithms (SEA). Springer, 28–40.
  11. Joinable Parallel Balanced Binary Trees. ACM Transactions on Parallel Computing (TOPC) 9, 2 (2022), 1–41.
  12. ParlayLib — a toolkit for parallel algorithms on shared-memory multicore machines. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 507–509.
  13. Just Join for Parallel Ordered Sets. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA).
  14. Optimal parallel algorithms in the binary-forking model. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 89–102.
  15. Parallelism in Randomized Incremental Algorithms. J. ACM 67, 5 (2020), 1–27.
  16. Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46, 5 (1999), 720–748.
  17. Maximizing social influence in nearly optimal time. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms. SIAM, 946–957.
  18. Trading space for time in undirected s-t connectivity. SIAM J. Comput. 23, 2 (1994), 324–334.
  19. Time-critical influence maximization in social networks with time-delayed diffusion process. In AAAI Conference on Artificial Intelligence, Vol. 26. 591–598.
  20. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 1029–1038.
  21. Efficient influence maximization in social networks. In ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 199–208.
  22. Staticgreedy: solving the scalability-accuracy dilemma in influence maximization. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 509–518.
  23. Sketch-based influence maximization and computation: Scaling up with guarantees. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management. 629–638.
  24. Introduction to Algorithms (3rd edition). MIT Press.
  25. PaC-trees: Supporting Parallel and Compressed Purely-Functional Collections. In ACM Conference on Programming Language Design and Implementation (PLDI).
  26. Theoretically efficient parallel graph algorithms can be fast and scalable. ACM Transactions on Parallel Computing (TOPC) 8, 1 (2021), 1–70.
  27. ConnectIt: a framework for static and incremental parallel graph connectivity algorithms. Proceedings of the VLDB Endowment (PVLDB) 14, 4 (2020), 653–667.
  28. Efficient Stepping Algorithms and Implementations for Parallel Shortest Paths. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 184–197.
  29. Dheeru Dua and Casey Graf. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/.
  30. Tight lower bounds for st-connectivity on the NNJAG model. SIAM J. Comput. 28, 6 (1999), 2257–2284.
  31. Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sensors and Actuators B: Chemical 215 (2015), 618–629.
  32. Gökhan Göktürk and Kamer Kaya. 2020. Boosting parallel influence-maximization kernels for undirected networks with fusing and vectorization. IEEE Transactions on Parallel and Distributed Systems 32, 5 (2020), 1001–1013.
  33. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing letters 12 (2001), 211–223.
  34. Celf++ optimizing the greedy algorithm for influence maximization in social networks. In Proceedings of the 20th international conference companion on World wide web. 47–48.
  35. Mark Granovetter. 1978. Threshold models of collective behavior. American journal of sociology 83, 6 (1978), 1420–1443.
  36. Parallel Longest Increasing Subsequence and van Emde Boas Trees. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA).
  37. Analysis of Work-Stealing and Parallel Cache Complexity. In SIAM Symposium on Algorithmic Principles of Computer Systems (APOCS). SIAM, 46–60.
  38. Parallel Cover Trees and their Applications. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 259–272.
  39. Parallel In-Place Algorithms: Theory and Practice. In SIAM Symposium on Algorithmic Principles of Computer Systems (APOCS). 114–128.
  40. Influence maximization revisited: Efficient reverse reachable set generation with bound tightened. In ACM SIGMOD International Conference on Management of Data (SIGMOD). 2167–2181.
  41. Influence maximization in real-world closed social networks. Proceedings of the VLDB Endowment 16, 2 (2022), 180–192.
  42. Irie: Scalable and robust influence maximization in social networks. In 2012 IEEE 12th international conference on data mining. IEEE, 918–923.
  43. Maximizing the spread of influence through a social network. In ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 137–146.
  44. Scalable and parallelizable processing of influence maximization for large-scale social networks?. In 2013 IEEE 29th international conference on data engineering (ICDE). IEEE, 266–277.
  45. Masahiro Kimura and Kazumi Saito. 2006. Tractable models for information diffusion in social networks. In Knowledge Discovery in Databases: PKDD 2006: 10th European Conference on Principles and Practice of Knowledge Discovery in Databases Berlin, Germany, September 18-22, 2006 Proceedings 10. Springer, 259–271.
  46. Adrian Kosowski. 2013. Faster walks in graphs: a O~⁢(n2)~𝑂superscript𝑛2\tilde{O}(n^{2})over~ start_ARG italic_O end_ARG ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) time-space trade-off for undirected st connectivity. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 1873–1883.
  47. Andreas Krause and Daniel Golovin. 2014. Submodular function maximization. Tractability 3 (2014), 71–104.
  48. What is Twitter, a social network or a news media?. In International World Wide Web Conference (WWW). 591–600.
  49. Scalable clustering algorithm for N-body simulations in a shared-nothing cluster. In International Conference on Scientific and Statistical Database Management. Springer, 132–150.
  50. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 420–429.
  51. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29–123.
  52. Influence maximization on social graphs: A survey. IEEE Transactions on Knowledge and Data Engineering (TKDE) 30, 10 (2018), 1852–1872.
  53. Time constrained influence maximization in social networks. In International Symposium on Algorithms and Computation (ISAAC). IEEE, 439–448.
  54. Influence maximization over large-scale social networks: A bounded linear approach. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management. 171–180.
  55. Web Data Commons — Hyperlink Graphs. http://webdatacommons.org/hyperlinkgraph.
  56. cuRipples: Influence maximization on multi-GPU systems. In International Conference for High Performance Computing, Networking, Storage, and Analysis (SC). 1–11.
  57. Fast and scalable implementations of influence maximization algorithms. In 2019 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 1–12.
  58. Lazier than lazy greedy. In AAAI Conference on Artificial Intelligence, Vol. 29.
  59. Naoto Ohsaka. 2020. The solution distribution of influence maximization: A high-level experimental study on three algorithmic approaches. In ACM SIGMOD International Conference on Management of Data (SIGMOD). 2151–2166.
  60. Fast and accurate influence maximization on large networks with pruned monte-carlo simulations. In AAAI Conference on Artificial Intelligence.
  61. Simona Mihaela Orzan. 2004. On distributed verification and verified distribution. (2004).
  62. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.
  63. Influence analysis in social networks: A survey. Journal of Network and Computer Applications 106 (2018), 17–32.
  64. Nosingles: a space-efficient algorithm for influence maximization. In Proceedings of the 30th International Conference on Scientific and Statistical Database Management. 1–12.
  65. Trust management for the semantic web. In The Semantic Web-ISWC 2003: Second International Semantic Web Conference, Sanibel Island, FL, USA, October 20-23, 2003. Proceedings 2. Springer, 351–368.
  66. Uncovering the temporal dynamics of diffusion networks. (2011), 561–568.
  67. Sequential and Parallel Algorithms and Data Structures. Springer.
  68. Thomas C Schelling. 2006. Micromotives and macrobehavior. WW Norton & Company.
  69. Many Sequential Iterative Algorithms Can Be Parallel and (Nearly) Work-efficient. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA).
  70. Sequential random permutation, list contraction and tree contraction are highly parallel. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 431–448.
  71. BFS and coloring-based parallel algorithms for strongly connected components and related problems. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 550–559.
  72. Yihan Sun and Guy Blelloch. 2019a. Implementing Parallel and Concurrent Tree Structures. In ACM Symposium on Principles and Practice of Parallel Programming (PPOPP). 447–450.
  73. Yihan Sun and Guy E Blelloch. 2019b. Parallel Range, Segment and Rectangle Queries with Augmented Maps. In SIAM Symposium on Algorithm Engineering and Experiments (ALENEX). 159–173.
  74. PAM: Parallel Augmented Maps. In ACM Symposium on Principles and Practice of Parallel Programming (PPOPP).
  75. Influence maximization in near-linear time: A martingale approach. In ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 1539–1554.
  76. Influence maximization: Near-optimal time complexity meets practical efficiency. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 75–86.
  77. Collective influence maximization for multiple competing products with an awareness-to-influence model. Proceedings of the VLDB Endowment 14, 7 (2021), 1124–1136.
  78. Parallel Strong Connectivity Based on Faster Reachability. In ACM SIGMOD International Conference on Management of Data (SIGMOD).
  79. Bring order into the samples: A novel scalable method for influence maximization. IEEE Transactions on Knowledge and Data Engineering 29, 2 (2016), 243–256.
  80. Community-based greedy algorithm for mining top-k influential nodes in mobile social networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 1039–1048.
  81. GeoGraph: A Framework for Graph Processing on Geometric Data. ACM SIGOPS Operating Systems Review 55, 1 (2021), 38–46.
  82. A Parallel Batch-Dynamic Data Structure for the Closest Pair Problem. In ACM Symposium on Computational Geometry (SoCG).
  83. Jiadong Xie. 2022. Hindering Influence Diffusion of Community. In ACM SIGMOD International Conference on Management of Data (SIGMOD). 2518–2520.
  84. Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42, 1 (2015), 181–213.
  85. GRAIN: improving data efficiency of gra ph neural networks via diversified in fluence maximization. Proceedings of the VLDB Endowment 14, 11 (2021), 2473–2482.
  86. Learning transportation mode from raw gps data for geographic applications on the web. In International World Wide Web Conference (WWW). 247–256.
  87. On the upper bounds of spread for greedy algorithms in social network influence maximization. IEEE Transactions on Knowledge and Data Engineering 27, 10 (2015), 2770–2783.
  88. A survey of information cascade analysis: Models, predictions, and recent advances. ACM Computing Surveys (CSUR) 54, 2 (2021), 1–36.
  89. Analysis of influence contribution in social advertising. Proceedings of the VLDB Endowment 15, 2 (2021), 348–360.
  90. Influence Maximization in Dynamic Social Networks. In IEEE International Conference on Data Mining (ICDM).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Letong Wang (8 papers)
  2. Xiangyun Ding (6 papers)
  3. Yan Gu (83 papers)
  4. Yihan Sun (35 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.