What if an SQL Statement Returned a Database? (2312.00638v1)
Abstract: Every SQL statement is limited to return a single, possibly denormalized, table. This design decision has far reaching consequences. (1.) for databases users in terms of slow query performance, long query result transfer times, usability-issues of SQL in web applications and object-relational mappers. In addition, (2.) for database architects it has consequences when designing query optimizers leading to logical (algebraic) join enumeration effort, memory consumption for intermediate result materialization, and physical operator selection effort. So basically, the entire query optimization stack is shaped by that design decision. In this paper, we argue that the single-table limitation should be dropped. We extend the SELECT-clause of SQL by a keyword 'RESULTDB' to support returning a result database. Our approach has clear semantics, i.e. our extended SQL returns subsets of all tables with only those tuples that would be part of the traditional (single-table) query result set, however without performing any denormalization through joins. Our SQL-extension is downward compatible. Moreover, we discuss the surprisingly long list of benefits of our approach. First, for database users: far simpler and more readable application code, better query performance, smaller query results, better query result transfer times. Second, for database architects, we present how to leverage existing closed source systems as well as change open source database systems to support our feature. We propose a couple of algorithms to integrate our feature into both closed-source as well as open source database systems. We present an initial experimental study with promising results.
- 2014. Sql Flaws. https://wiki.c2.com/?SqlFlaws. Accessed: 2023-11-28.
- AsterixDB: A Scalable, Open Source BDMS. Proc. VLDB Endow. 7, 14 (2014), 1905–1916. https://doi.org/10.14778/2733085.2733096
- Philip A. Bernstein and Dah-Ming W. Chiu. 1981. Using Semi-Joins to Solve Relational Queries. J. ACM 28, 1 (1981), 25–40. https://doi.org/10.1145/322234.322238
- The Skyline Operator. In Proceedings of the 17th International Conference on Data Engineering, April 2-6, 2001, Heidelberg, Germany, Dimitrios Georgakopoulos and Alexander Buchmann (Eds.). IEEE Computer Society, 421–430. https://doi.org/10.1109/ICDE.2001.914855
- E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 6 (1970), 377–387. https://doi.org/10.1145/362384.362685
- C. J. Date. 1984. A Critique of the SQL Database Language. SIGMOD Rec. 14, 3 (1984), 8–54. https://doi.org/10.1145/984549.984551
- Bridging the Gap between OLAP and SQL. In Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 - September 2, 2005, Klemens Böhm, Christian S. Jensen, Laura M. Haas, Martin L. Kersten, Per-Åke Larson, and Beng Chin Ooi (Eds.). ACM, 1031–1042. http://www.vldb.org/archives/website/2005/program/paper/tue/p1031-dittrich.pdf
- John Grant. 2008. Null values in SQL. SIGMOD Rec. 37, 3 (2008), 23–25. https://doi.org/10.1145/1462571.1462575
- Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. In Proceedings of the Twelfth International Conference on Data Engineering, February 26 - March 1, 1996, New Orleans, Louisiana, USA, Stanley Y. W. Su (Ed.). IEEE Computer Society, 152–159. https://doi.org/10.1109/ICDE.1996.492099
- The Vertica Analytic Database: C-Store 7 Years Later. Proc. VLDB Endow. 5, 12 (2012), 1790–1801. https://doi.org/10.14778/2367502.2367518
- Thomas Neumann and Gerhard Weikum. 2009. Scalable join processing on very large RDF graphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009, Ugur Çetintemel, Stanley B. Zdonik, Donald Kossmann, and Nesime Tatbul (Eds.). ACM, 627–640. https://doi.org/10.1145/1559845.1559911
- Dan Olteanu and Maximilian Schleich. 2016. Factorized Databases. SIGMOD Rec. 45, 2 (2016), 5–16. https://doi.org/10.1145/3003665.3003667
- The SQL++ Semi-structured Data Model and Query Language: A Capabilities Survey of SQL-on-Hadoop, NoSQL and NewSQL Databases. CoRR abs/1405.3631 (2014). arXiv:1405.3631 http://arxiv.org/abs/1405.3631
- Elvis Pranskevichus. 2019. We Can Do Better Than SQL. https://www.edgedb.com/blog/we-can-do-better-than-sql. Accessed: 2023-11-28.
- Materialization strategies in the Vertica analytic database: Lessons learned. In 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8-12, 2013, Christian S. Jensen, Christopher M. Jermaine, and Xiaofang Zhou (Eds.). IEEE Computer Society, 1196–1207. https://doi.org/10.1109/ICDE.2013.6544909
- Errors and Complications in SQL Query Formulation. ACM Trans. Comput. Educ. 18, 3 (2018), 15:1–15:29. https://doi.org/10.1145/3231712
- Mihalis Yannakakis. 1981. Algorithms for Acyclic Database Schemes. In Very Large Data Bases, 7th International Conference, September 9-11, 1981, Cannes, France, Proceedings. IEEE Computer Society, 82–94.
- Looking Ahead Makes Query Plans Robust. Proc. VLDB Endow. 10, 8 (2017), 889–900. https://doi.org/10.14778/3090163.3090167