Hyperedge Estimation using Polylogarithmic Subset Queries (1908.04196v4)
Abstract: In this work, we estimate the number of hyperedges in a hypergraph ${\cal H}(U({\cal H}), {\cal F}({\cal H}))$, where $U({\cal H})$ denotes the set of vertices and ${\cal F}({\cal H}))$ denotes the set of hyperedges. We assume a query oracle access to the hypergraph ${\cal H}$. Estimating the number of edges, triangles or small subgraphs in a graph is a well studied problem. Beame \etal~and Bhattacharya \etal~gave algorithms to estimate the number of edges and triangles in a graph using queries to the {\sc Bipartite Independent Set} ({\sc BIS}) and the {\sc Tripartite Independent Set} ({\sc TIS}) oracles, respectively. We generalize the earlier works by estimating the number of hyperedges using a query oracle, known as the {\bf Generalized $d$-partite independent set oracle ({\sc GPIS})}, that takes $d$ (non-empty) pairwise disjoint subsets of vertices $A_1,\ldots,A_d \subseteq U({\cal H})$ as input, and answers whether there exists a hyperedge in ${\cal H}$ having (exactly) one vertex in each $A_i, i \in {1,2,\ldots,d}$. We give a randomized algorithm for the hyperedge estimation problem using the {\sc GPIS} query oracle to output $\widehat{m}$ for $m({\cal H})$ satisfying $(1-\epsilon) \cdot m({\cal H}) \leq \widehat{m} \leq (1+\epsilon) \cdot m({\cal H})$. The number of queries made by our algorithm, assuming $d$ to be a constant, is polylogarithmic in the number of vertices of the hypergraph.