What if an SQL Statement Returned a Database? (2312.00638v1)

Published 1 Dec 2023 in cs.DB

Abstract: Every SQL statement is limited to return a single, possibly denormalized, table. This design decision has far reaching consequences. (1.) for databases users in terms of slow query performance, long query result transfer times, usability-issues of SQL in web applications and object-relational mappers. In addition, (2.) for database architects it has consequences when designing query optimizers leading to logical (algebraic) join enumeration effort, memory consumption for intermediate result materialization, and physical operator selection effort. So basically, the entire query optimization stack is shaped by that design decision. In this paper, we argue that the single-table limitation should be dropped. We extend the SELECT-clause of SQL by a keyword 'RESULTDB' to support returning a result database. Our approach has clear semantics, i.e. our extended SQL returns subsets of all tables with only those tuples that would be part of the traditional (single-table) query result set, however without performing any denormalization through joins. Our SQL-extension is downward compatible. Moreover, we discuss the surprisingly long list of benefits of our approach. First, for database users: far simpler and more readable application code, better query performance, smaller query results, better query result transfer times. Second, for database architects, we present how to leverage existing closed source systems as well as change open source database systems to support our feature. We propose a couple of algorithms to integrate our feature into both closed-source as well as open source database systems. We present an initial experimental study with promising results.

References (18)

Summary

The paper introduces the RESULTDB extension that enables SQL to return a subdatabase and preserve relational context instead of a denormalized table.
It evaluates various methods including dynamic and materialized SELECT DISTINCT queries alongside Yannakakis-based algorithms for acyclic join optimization.
The approach reduces data redundancy and transmission costs while simplifying application development by maintaining key relational information.

Analyzing the Constraints and Enhancement of SQL: Introduction of RESULTDB

The paper "What if an SQL Statement Returned a Database?" presents a compelling discourse on the limitations imposed by SQL's design choice of returning query results as single tables. SQL traditionally returns the result of a query as a single, possibly denormalized table, which has wide-ranging implications for both database users and architects.

Problems with Single Table Returns

The restriction to single-table query results in SQL imposes several technical challenges:

Redundant Data: The result tables often contain redundant data, particularly when joins are involved. This redundancy arises because the current SQL architecture denormalizes the results by repeating values, leading to inefficiencies in data storage and transmission.
Information Loss: Key and relational information is often lost when denormalizing data during joins. This makes it difficult to maintain the original context and relationships of the data in the output.
Data Transmission Costs: Larger result tables with redundant data inflate the cost of data transmission from the server to the client, posing a significant performance bottleneck, especially for large datasets.
Memory Consumption: The materialization of large intermediate results due to join operations in memory can lead to high memory usage and inefficiencies during query processing.
Complexity in Application Code: Developers often have to write additional code to transform these flat, denormalized results back into a form that their applications can use effectively. This additional complexity increases the potential for errors and inefficiency.

Proposed Solution: RESULTDB

In response to these challenges, the paper proposes an extension to SQL that adds a RESULTDB keyword to the SELECT clause. The SELECT RESULTDB syntax allows a query to return a subdatabase instead of a single table. This subdatabase includes the necessary subsets of the original tables without denormalizing the data through joins.

Benefits of the RESULTDB Approach

The RESULTDB extension offers several advantages:

Simplified Application Development: By receiving results as a subdatabase, application developers can more easily manipulate the data without writing complex transformation logic.
Improved Query Performance: Avoiding the creation of denormalized join results can lead to smaller intermediate tables, reducing memory consumption and potentially enhancing query performance.
Reduced Data Redundancy and Transmission Costs: As the subdatabase contains only the necessary subsets of tables, data redundancy is minimized, and the cost of transmitting the query results is reduced.
Preservation of Relational Information: Returning a subdatabase preserves key and relational information, maintaining the integrity and relationships inherent in the original schema.

Implementation Methods

The paper discusses several methods for integrating the RESULTDB functionality into existing SQL systems, categorized into SQL-based rewrites and direct integration into DBMS query optimizers:

Dynamic SELECT DISTINCT: This method transforms the original query into multiple SELECT DISTINCT queries, each reducing the dataset to the tuples that participate in the join. While this approach is straightforward, it risks repeatedly computing expensive joins.
Materialized SELECT DISTINCT: Here, a materialized view (MV) of the query result is created and used to issue distinct queries against the MV, potentially reducing repeated computation at the cost of materialization overhead.
Dynamic Subquery: Subqueries are used to enforce semi-join semantics dynamically, guiding the optimizer to use semi-joins instead of full inner joins, though this approach depends heavily on the optimizer's capabilities.
Materialized Subquery: This method prematerializes parts of the join in separate materialized views, optimizing for semi-join strategies to balance computation and storage costs.

Algorithmic Approach

For more efficient processing, the paper introduces an algorithm leveraging Yannakakis' algorithm to handle acyclic join graphs. To address cyclic joins, a folding method is proposed to transform the cyclic join graphs into acyclic ones, enabling the application of Yannakakis' efficient reduction techniques. This approach retains the relational context and minimizes computational waste.

Implications and Future Work

The implications of this research are significant for both theoretical and practical aspects of database management. Theoretically, it reintroduces semi-join reduction in a novel context, ensuring minimal data redundancy. Practically, it provides a more efficient method of interacting with databases, which could lead to considerable performance improvements in real-world applications.

Future developments could focus on optimizing the root selection and join order in semi-join reductions to further maximize the efficiency of the proposed methods. Additionally, exploring database transformation techniques could complement the data retrieval enhancements discussed in this paper.

Conclusion

The proposition and initial experiments detailed in the paper mark an important step towards refining SQL for improved performance and usability. By addressing the critical issues stemming from the single-table result paradigm, the RESULTDB extension offers a cleaner, more efficient way to handle complex queries, promising significant advancements in the field of database management systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/avyfain/status/1744830566288220298

https://twitter.com/Southclaws/status/1743290274900521307

https://twitter.com/1365028123004309506/status/1734363888613802045

https://twitter.com/1689484989589774336/status/1734528534981587160

https://twitter.com/15574108/status/1734588250243600705

https://twitter.com/1186620634803032064/status/1734333861641769273

HackerNews

What if an SQL statement returned a database? (309 points, 158 comments)

Reddit

What if an SQL Statement Returned a Database? (268 points, 96 comments)
What if an SQL Statement Returned a Database? (46 points, 14 comments)
What if an SQL Statement Returned a Database? (3 points, 6 comments)
What if an SQL Statement Returned a Database? (3 points, 1 comment)
What if an SQL Statement Returned a Database? (2 points, 6 comments)
What if an SQL statement returned a database? (1 point, 1 comment)
What if an SQL Statement Returned a Database? (1 point, 1 comment)
What if an SQL statement returned a database? (1 point, 1 comment)
What if an SQL statement returned a database? (0 points, 4 comments)