Output-Optimal Algorithms for Join-Aggregate Queries (2406.05536v5)
Abstract: One of the most celebrated results of computing join-aggregate queries defined over commutative semi-rings is the classic Yannakakis algorithm proposed in 1981. It is known that the runtime of the Yannakakis algorithm is $O(N + \OUT)$ for any free-connex query, where $N$ is the input size of the database and $\OUT$ is the output size of the query result. This is already output-optimal. However, only an upper bound $O(N \cdot \OUT)$ on the runtime is known for the large remaining class of acyclic but non-free-connex queries. Alternatively, one can convert a non-free-connex query into a free-connex one using tree decomposition techniques and then run the Yannakakis algorithm. This approach takes $O\left(N{#\fnsubw} + \OUT\right)$ time, where $#\fnsubw$ is the {\em free-connex sub-modular width} of the input query. But, none of these results is known to be output-optimal. In this paper, we show a matching lower and upper bound $\Theta\left(N \cdot \OUT{1- \frac{1}{\fnfhtw}} + \OUT\right)$ for computing general acyclic join-aggregate queries by {\em semiring algorithms, where $\fnfhtw$ is the free-connex fractional hypertree width} of the query. For example, $\fnfhtw=1$ for free-connex queries, $\fnfhtw =2$ for line queries (a.k.a. chain matrix multiplication), and $\fnfhtw=k$ for star queries (a.k.a. star matrix multiplication) with $k$ relations. While this measure has been defined before, we are the first to use it to characterize the output-optimal complexity of acyclic join-aggregate queries. To our knowledge, this has been the first polynomial improvement over the Yannakakis algorithm in the last 40 years and completely resolves the open question of an output-optimal algorithm for computing acyclic join-aggregate queries.