Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Compiler for Operations on Relations with Bag Semantics

Published 10 Feb 2025 in cs.PL and cs.DB | (2502.06988v1)

Abstract: We describe an abstract loop-based intermediate representation that can express fused implementations of relational algebra expressions on sets and bags (multisets). The loops are abstracted away from physical data structures thus making it easier to generate, reason about, and perform optimization like fusion on. The IR supports the natural relational algebra as well as complex operators that are used in production database systems, including outer joins, non-equi joins, and differences. We then show how to compile this IR to efficient C++ code that co-iterates over the physical data structures present in the relational algebra expression. Our approach lets us express fusion across disparate operators, leading to a 3.87x speedup (0.77--12.23x) on selected LSQB benchmarks and worst-case optimal triangle queries. We also demonstrate that our compiler generates code of high quality: it has similar sequential performance to Hyper on TPC-H with a 1.00x speedup (0.38--4.34x) and competitive parallel performance with a 0.61x speedup (0.23--1.80x). Finally, our approach is portable across data structures.

Authors (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.