Approximating Single-Source Personalized PageRank with Absolute Error Guarantees (2401.01019v1)
Abstract: Personalized PageRank (PPR) is an extensively studied and applied node proximity measure in graphs. For a pair of nodes $s$ and $t$ on a graph $G=(V,E)$, the PPR value $\pi(s,t)$ is defined as the probability that an $\alpha$-discounted random walk from $s$ terminates at $t$, where the walk terminates with probability $\alpha$ at each step. We study the classic Single-Source PPR query, which asks for PPR approximations from a given source node $s$ to all nodes in the graph. Specifically, we aim to provide approximations with absolute error guarantees, ensuring that the resultant PPR estimates $\hat{\pi}(s,t)$ satisfy $\max_{t\in V}\big|\hat{\pi}(s,t)-\pi(s,t)\big|\le\varepsilon$ for a given error bound $\varepsilon$. We propose an algorithm that achieves this with high probability, with an expected running time of - $\widetilde{O}\big(\sqrt{m}/\varepsilon\big)$ for directed graphs, where $m=|E|$; - $\widetilde{O}\big(\sqrt{d_{\mathrm{max}}}/\varepsilon\big)$ for undirected graphs, where $d_{\mathrm{max}}$ is the maximum node degree in the graph; - $\widetilde{O}\left(n{\gamma-1/2}/\varepsilon\right)$ for power-law graphs, where $n=|V|$ and $\gamma\in\left(\frac{1}{2},1\right)$ is the extent of the power law. These sublinear bounds improve upon existing results. We also study the case when degree-normalized absolute error guarantees are desired, requiring $\max_{t\in V}\big|\hat{\pi}(s,t)/d(t)-\pi(s,t)/d(t)\big|\le\varepsilon_d$ for a given error bound $\varepsilon_d$, where the graph is undirected and $d(t)$ is the degree of node $t$. We give an algorithm that provides this error guarantee with high probability, achieving an expected complexity of $\widetilde{O}\left(\sqrt{\sum_{t\in V}\pi(s,t)/d(t)}\big/\varepsilon_d\right)$. This improves over the previously known $O(1/\varepsilon_d)$ complexity.
- Local computation of pagerank contributions. In Proc. 5th Int. Workshop Algorithms Models Web Graph, volume 4863, pages 150–165, 2007. doi:10.1007/978-3-540-77004-6_12.
- Local computation of pagerank contributions. Internet Math., 5(1):23–45, 2008. doi:10.1080/15427951.2008.10129302.
- Reid Andersen and Fan R. K. Chung. Detecting sharp drops in pagerank and a simplified local partitioning algorithm. In Proc. 4th Int. Conf. Theory Appl. Models Comput., volume 4484, pages 1–12, 2007. doi:10.1007/978-3-540-72504-6_1.
- Local graph partitioning using pagerank vectors. In Proc. 47th Annu. IEEE Symp. Found. Comput. Sci., pages 475–486, 2006. doi:10.1109/FOCS.2006.44.
- Using pagerank to locally partition a graph. Internet Math., 4(1):35–64, 2007. doi:10.1080/15427951.2007.10129139.
- On the choice of kernel and labelled data in semi-supervised learning methods. In Proc. 10th Int. Workshop Algorithms Models Web Graph, volume 8305, pages 56–67, 2013. doi:10.1007/978-3-319-03536-9_5.
- Quick detection of top-k personalized pagerank lists. In Proc. 8th Int. Workshop Algorithms Models Web Graph, volume 6732, pages 50–61, 2011. doi:10.1007/978-3-642-21286-4_5.
- Fast personalized pagerank on mapreduce. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 973–984, 2011. doi:10.1145/1989323.1989425.
- Fast incremental and personalized pagerank. Proc. VLDB Endowment, 4(3):173–184, 2010. URL: http://www.vldb.org/pvldb/vol4/p173-bahmani.pdf, doi:10.14778/1929861.1929864.
- Emergence of scaling in random networks. Science, 286(5439):509–512, 1999. doi:10.1126/science.286.5439.509.
- Pavel Berkhin. Bookmark-coloring algorithm for personalized pagerank computing. Internet Math., 3(1):41–62, 2006. doi:10.1080/15427951.2006.10129116.
- Scaling graph neural networks with approximate pagerank. In Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 2464–2473, 2020. doi:10.1145/3394486.3403296.
- Directed scale-free graphs. In Proc. ACM-SIAM Symp. Discrete Algorithms, pages 132–139, 2003. URL: http://dl.acm.org/citation.cfm?id=644108.644133.
- The anatomy of a large-scale hypertextual web search engine. Comput. Netw., 30(1-7):107–117, 1998. doi:10.1016/S0169-7552(98)00110-X.
- Fan R. K. Chung and Lincoln Lu. Survey: Concentration inequalities and martingale inequalities: A survey. Internet Math., 3(1):79–127, 2006. doi:10.1080/15427951.2006.10129115.
- Efficient processing of network proximity queries via chebyshev acceleration. In Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 1515–1524, 2016. doi:10.1145/2939672.2939828.
- Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Math., 2(3):333–358, 2005. doi:10.1080/15427951.2005.10129104.
- Variational perspective on local graph clustering. Math. Program., 174(1-2):553–573, 2019. URL: https://doi.org/10.1007/s10107-017-1214-8, doi:10.1007/S10107-017-1214-8.
- Fast and exact top-k search for random walk with restart. Proc. VLDB Endowment, 5(5):442–453, 2012. URL: http://vldb.org/pvldb/vol5/p442_yasuhirofujiwara_vldb2012.pdf, doi:10.14778/2140436.2140441.
- Efficient ad-hoc search for personalized pagerank. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 445–456, 2013. doi:10.1145/2463676.2463717.
- Efficient personalized pagerank with accuracy assurance. In Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 15–23, 2012. doi:10.1145/2339530.2339538.
- David F. Gleich. Pagerank beyond the web. SIAM Rev., 57(3):321–363, 2015. doi:10.1137/140976649.
- Distributed algorithms on exact personalized pagerank. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 479–494, 2017. doi:10.1145/3035918.3035920.
- Parallel personalized pagerank on dynamic graphs. Proc. VLDB Endowment, 11(1):93–106, 2017. URL: http://www.vldb.org/pvldb/vol11/p93-guo.pdf, doi:10.14778/3151113.3151121.
- Massively parallel algorithms for personalized pagerank. Proc. VLDB Endowment, 14(9):1668–1680, 2021. URL: http://www.vldb.org/pvldb/vol14/p1668-wang.pdf, doi:10.14778/3461535.3461554.
- Random generation of combinatorial structures from a uniform distribution. Theor. Comput. Sci., 43:169–188, 1986. doi:10.1016/0304-3975(86)90174-X.
- Bepi: Fast and memory-efficient method for billion-scale random walk with restart. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 789–804, 2017. doi:10.1145/3035918.3035950.
- Predict then propagate: Graph neural networks meet personalized pagerank. In Proc. 7th Int. Conf. Learn. Representations, 2019. URL: https://openreview.net/forum?id=H1gL-2A9Ym.
- Efficient personalized pagerank computation: The power of variance-reduced monte carlo approaches. Proc. ACM Manage. Data, 1(2):160:1–160:26, 2023. doi:10.1145/3589305.
- Efficient personalized pagerank computation: A spanning forests sampling based approach. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 2048–2061, 2022. doi:10.1145/3514221.3526140.
- Index-free approach with theoretical guarantee for efficient random walk with restart query. In Proc. 36th Int. Conf. Data Eng., pages 913–924, 2020. doi:10.1109/ICDE48307.2020.00084.
- Wenqing Lin. Distributed algorithms for fully personalized pagerank on large graphs. In Proc. Int. Conf. World Wide Web, pages 1084–1094, 2019. doi:10.1145/3308558.3313555.
- Personalized pagerank estimation and search: A bidirectional approach. In Proc. 9th ACM Int. Conf. Web Search Data Mining, pages 163–172, 2016. doi:10.1145/2835776.2835823.
- Fast-ppr: scaling personalized pagerank estimation for large graphs. In Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 1436–1445, 2014. doi:10.1145/2623330.2623745.
- Personalized pagerank to a target node. CoRR, abs/1304.4658, 2013. URL: http://arxiv.org/abs/1304.4658, arXiv:1304.4658.
- Computing personalized pagerank quickly by exploiting graph structures. Proc. VLDB Endowment, 7(12):1023–1034, 2014. URL: http://www.vldb.org/pvldb/vol7/p1023-maehara.pdf, doi:10.14778/2732977.2732978.
- Efficient pagerank tracking in evolving networks. In Proc. 21st ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 875–884, 2015. doi:10.1145/2783258.2783297.
- Asymmetric transitivity preserving graph embedding. In Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 1105–1114, 2016. doi:10.1145/2939672.2939751.
- Realtime top-k personalized pagerank over large graphs on gpus. Proc. VLDB Endowment, 13(1):15–28, 2019. URL: http://www.vldb.org/pvldb/vol13/p15-shi.pdf, doi:10.14778/3357377.3357379.
- Bear: Block elimination approach for random walk with restart on large graphs. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 1571–1585, 2015. doi:10.1145/2723372.2723716.
- Verse: Versatile graph embeddings from similarity measures. In Proc. Int. Conf. World Wide Web, pages 539–548, 2018. doi:10.1145/3178876.3186120.
- Alastair J Walker. New fast method for generating discrete random numbers with arbitrary frequency distributions. Electronics Letters, 8(10):127–128, 1974. doi:10.1049/el:19740097.
- Approximate graph propagation. In Proc. 27th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 1686–1696, 2021. doi:10.1145/3447548.3467243.
- Personalized pagerank to a target node, revisited. In Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 657–667, 2020. doi:10.1145/3394486.3403108.
- Parallelizing approximate single-source personalized pagerank queries on shared memory. VLDB J., 28(6):923–940, 2019. URL: https://doi.org/10.1007/s00778-019-00576-7, doi:10.1007/S00778-019-00576-7.
- Hubppr: Effective indexing for approximate personalized pagerank. Proc. VLDB Endowment, 10(3):205–216, 2016. URL: http://www.vldb.org/pvldb/vol10/p205-wang.pdf, doi:10.14778/3021924.3021936.
- Efficient algorithms for approximate single-source personalized pagerank queries. ACM Trans. Database Syst., 44(4):18:1–18:37, 2019. doi:10.1145/3360902.
- Fora: Simple and effective approximate single-source personalized pagerank. In Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 505–514, 2017. doi:10.1145/3097983.3098072.
- Prsim: Sublinear time simrank computation on large power-law graphs. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 1042–1059, 2019. doi:10.1145/3299869.3319873.
- Topppr: Top-k personalized pagerank queries with precision guarantees on large graphs. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 441–456, 2018. doi:10.1145/3183713.3196920.
- Unifying the global and local approaches: An efficient power iteration with forward push. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 1996–2008, 2021. doi:10.1145/3448016.3457298.
- Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 1139–1150, 2014. doi:10.1145/2588555.2610500.
- Local higher-order graph clustering. In Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 555–564, 2017. doi:10.1145/3097983.3098069.
- Scalable graph embeddings via sparse transpose proximities. In Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 1429–1437, 2019. doi:10.1145/3292500.3330860.
- Fast and accurate random walk with restart on dynamic graphs with guarantees. In Proc. Int. Conf. World Wide Web, pages 409–418, 2018. doi:10.1145/3178876.3186107.
- Tpa: Fast, scalable, and accurate method for approximate random walk with restart on billion scale graphs. In Proc. 34th Int. Conf. Data Eng., pages 1132–1143, 2018. doi:10.1109/ICDE.2018.00105.
- Irwr: incremental random walk with restart. In Proc. 36th ACM SIGIR Int. Conf. Res. Develop. Inf. Retrieval, pages 1017–1020, 2013. doi:10.1145/2484028.2484114.
- Random walk with restart over dynamic graphs. In Proc. 16th Int. Conf. Data Mining, pages 589–598, 2016. doi:10.1109/ICDM.2016.0070.
- Incremental and accuracy-aware personalized pagerank through scheduled approximation. Proc. VLDB Endowment, 6(6):481–492, 2013. URL: http://www.vldb.org/pvldb/vol6/p481-zhu.pdf, doi:10.14778/2536336.2536348.