Shortest Path Centrality and the APSP problem via VC-dimension and Rademacher Averages (1911.13144v4)
Abstract: In this paper we are interested in a version of the All-pairs Shortest Paths problem (APSP) that fits neither in the exact nor in the approximate case. We define a measure of centrality of a shortest path, related to the ``importance'' of such shortest path in the graph, and propose an algorithm based on the idea of progressive sampling that, for {\it any fixed constants} $0 < \epsilon$, $ \delta < 1$, given an undirected graph $G$ with non-negative edge weights, outputs with probability $1 - \delta$ a data structure of size $n \cdot \textrm{Diam}_V(G)$, where $\textrm{Diam}_V(G)$ is the vertex diameter of $G$, in expected time $\mathcal{O}(\lg n \max(m + n \log n, n \cdot \textrm{Diam}_V(G)))$ containing the (exact) distance and the shortest path between every pair of vertices $(u,v)$ that has centrality at least $\epsilon$. The progressive sampling technique is sensitive to the probability distribution of the input (if we assume that $G$ is chosen from a prescribed random distribution), but even in the case where we take no assumption about such distribution, we show an upper bound for the sample size using VC-dimension theory that is tighter than the bound given by standard Hoeffding and union bounds, since VC-dimension theory captures the combinatorial structure of the input graph.