Using Colors and Sketches to Count Subgraphs in a Streaming Graph (2302.12210v1)
Abstract: Suppose we wish to estimate $#H$, the number of copies of some small graph $H$ in a large streaming graph $G$. There are many algorithms for this task when $H$ is a triangle, but just a few that apply to arbitrary $H$. Here we focus on one such algorithm, which was introduced by Kane, Mehlhorn, Sauerwald, and Sun. The storage and update time per edge for their algorithm are both $O(mk/(#H)2)$, where $m$ is the number of edges in $G$, and $k$ is the number of edges in $H$. Here, we propose three modifications to their algorithm that can dramatically reduce both the storage and update time. Suppose that $H$ has no leaves and that $G$ has maximum degree $\leq m{1/2 - \alpha}$, where $\alpha > 0$. Define $C = \min(m{2\alpha},m{1/3})$. Then in our version of the algorithm, the update time per edge is $O(1)$, and the storage is approximately reduced by a factor of $C{2k-t-2}$, where $t$ is the number of vertices in $H$; in particular, the storage is $O(C2 + mk/(C{2k-t-2} (#H)2))$.