Flexible Caching in Trie Joins (1602.08721v1)

Published 28 Feb 2016 in cs.DB

Abstract: Traditional algorithms for multiway join computation are based on rewriting the order of joins and combining results of intermediate subqueries. Recently, several approaches have been proposed for algorithms that are "worst-case optimal" wherein all relations are scanned simultaneously. An example is Veldhuizen's Leapfrog Trie Join (LFTJ). An important advantage of LFTJ is its small memory footprint, due to the fact that intermediate results are full tuples that can be dumped immediately. However, since the algorithm does not store intermediate results, recurring joins must be reconstructed from the source relations, resulting in excessive memory traffic. In this paper, we address this problem by incorporating caches into LFTJ. We do so by adopting recent developments on join optimization, tying variable ordering to tree decomposition. While the traditional usage of tree decomposition computes the result for each bag in advance, our proposed approach incorporates caching directly into LFTJ and can dynamically adjust the size of the cache. Consequently, our solution balances memory usage and repeated computation, as confirmed by our experiments over SNAP datasets.

Citations (43)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Related Papers

Graphical Join: A New Physical Join Algorithm for RDBMSs (2022)
Worst-Case Optimal Radix Triejoin (2019)
Optimal Joins using Compact Data Structures (2019)
General-Purpose Join Algorithms for Listing Triangles in Large Graphs (2015)
Worst-case Optimal Join Algorithms (2012)