Learning Hierarchical Structure of Clusterable Graphs (2207.02581v1)
Abstract: We consider the problem of learning the hierarchical cluster structure of graphs in the seeded model, where besides the input graph the algorithm is provided with a small number of `seeds', i.e. correctly clustered data points. In particular, we ask whether one can approximate the Dasgupta cost of a graph, a popular measure of hierarchical clusterability, in sublinear time and using a small number of seeds. Our main result is an $O(\sqrt{\log k})$ approximation to Dasgupta cost of $G$ in $\approx \text{poly}(k)\cdot n{1/2+O(\epsilon)}$ time using $\approx \text{poly}(k)\cdot n{O(\epsilon)}$ seeds, effectively giving a sublinear time simulation of the algorithm of Charikar and Chatziafratis[SODA'17] on clusterable graphs. To the best of our knowledge, ours is the first result on approximating the hierarchical clustering properties of such graphs in sublinear time.