Quantifying Semantic Query Similarity for Automated Linear SQL Grading: A Graph-based Approach (2403.14441v1)
Abstract: Quantifying the semantic similarity between database queries is a critical challenge with broad applications, ranging from query log analysis to automated educational assessment of SQL skills. Traditional methods often rely solely on syntactic comparisons or are limited to checking for semantic equivalence. This paper introduces a novel graph-based approach to measure the semantic dissimilarity between SQL queries. Queries are represented as nodes in an implicit graph, while the transitions between nodes are called edits, which are weighted by semantic dissimilarity. We employ shortest path algorithms to identify the lowest-cost edit sequence between two given queries, thereby defining a quantifiable measure of semantic distance. A prototype implementation of this technique has been evaluated through an empirical study, which strongly suggests that our method provides more accurate and comprehensible grading compared to existing techniques. Moreover, the results indicate that our approach comes close to the quality of manual grading, making it a robust tool for diverse database query comparison tasks.
- Foundations of Databases. Addison-Wesley. http://webdam.inria.fr/Alice/
- Equivalences Among Relational Expressions. SIAM J. Comput. 8, 2 (1979), 218–246. https://doi.org/10.1137/0208017
- The XDa-TA system for automated grading of SQL query assignments. In 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, Johannes Gehrke, Wolfgang Lehner, Kyuseok Shim, Sang Kyun Cha, and Guy M. Lohman (Eds.). IEEE Computer Society, 1468–1471. https://doi.org/10.1109/ICDE.2015.7113403
- Navigating the Maze of Wikidata Query Logs. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.). ACM, 127–138. https://doi.org/10.1145/3308558.3313472
- Automated Grading of SQL Queries. In 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8-11, 2019. IEEE, 1630–1633. https://doi.org/10.1109/ICDE.2019.00159
- Edit Based Grading of SQL Queries. arXiv:1912.09019 http://arxiv.org/abs/1912.09019
- Edit Based Grading of SQL Queries. In CODS-COMAD 2021: 8th ACM IKDD CODS and 26th COMAD, Virtual Event, Bangalore, India, January 2-4, 2021, Jayant R. Haritsa, Shourya Roy, Manish Gupta, Sharad Mehrotra, Balaji Vasan Srinivasan, and Yogesh Simmhan (Eds.). ACM, 56–64. https://doi.org/10.1145/3430984.3431012
- Data generation for testing and grading SQL queries. VLDB J. 24, 6 (2015), 731–755. https://doi.org/10.1007/s00778-015-0395-0
- Partial Marking for Automated Grading of SQL Queries. Proc. VLDB Endow. 9, 13 (2016), 1541–1544. https://doi.org/10.14778/3007263.3007304
- Bikash Chandra and S. Sudarshan. 2022. Automated Grading of SQL Queries. IEEE Data Eng. Bull. 45, 3 (2022), 17–28. http://sites.computer.org/debull/A22sept/p17.pdf
- Axiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries. Proc. VLDB Endow. 11, 11 (2018), 1482–1495. https://doi.org/10.14778/3236187.3236200
- Cosette: An Automated Prover for SQL. In 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8-11, 2017, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2017/papers/p51-chu-cidr17.pdf
- Robert Dollinger and Nathaniel A. Melville. 2011. Semantic evaluation of SQL queries. In IEEE International Conference on Intelligent Computer Communication and Processing, ICCP 2011, Cluj-Napoca, Romania, August 25-27, 2011. IEEE, 57–64. https://doi.org/10.1109/ICCP.2011.6047844
- The Effects of Adding Non-Compulsory Exercises to an Online Learning Tool on Student Performance and Code Copying. ACM Trans. Comput. Educ. 19, 3 (2019), 16:1–16:22. https://doi.org/10.1145/3264507
- A Novel System for Automatic, Configurable and Partial Assessment of Student SQL Queries. In 43rd International Convention on Information, Communication and Electronic Technology, MIPRO 2020, Opatija, Croatia, September 28 - October 2, 2020, Marko Koricic, Karolj Skala, Zeljka Car, Marina Cicin-Sain, Vlado Sruk, Dejan Skvorc, Slobodan Ribaric, Bojan Jerbic, Stjepan Gros, Boris Vrdoljak, Mladen Mauher, Edvard Tijan, Tihomir Katulic, Predrag Pale, Tihana Galinac Grbac, Nikola Filip Fijan, Adrian Boukalov, Dragan Cisic, and Vera Gradisnik (Eds.). IEEE, 832–837. https://doi.org/10.23919/MIPRO48935.2020.9245264
- Ariel Felner. 2011. Position Paper: Dijkstra’s Algorithm versus Uniform Cost Search or a Case Against Dijkstra’s Algorithm. In Proceedings of the Fourth Annual Symposium on Combinatorial Search, SOCS 2011, Castell de Cardona, Barcelona, Spain, July 15.16, 2011, Daniel Borrajo, Maxim Likhachev, and Carlos Linares López (Eds.). AAAI Press. http://www.aaai.org/ocs/index.php/SOCS/SOCS11/paper/view/4017
- Automated grading and tutoring of SQL statements to improve student learning. In 13th Koli Calling International Conference on Computing Education Research, Koli Calling ’13, Koli, Finland, November 14-17, 2013, Mikko-Jussi Laakso and Simon (Eds.). ACM, 161–168. https://doi.org/10.1145/2526968.2526986
- DBLearn: Adaptive e-learning for practical database course - An integrated architecture approach. In 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2017, Kanazawa, Japan, June 26-28, 2017, Teruhisa Hochin, Hiroaki Hirata, and Hiroki Nomiya (Eds.). IEEE Computer Society, 109–114. https://doi.org/10.1109/SNPD.2017.8022708
- Fauhat Ali Khan Panni and Abu Sayed Md. Latiful Hoque. 2020. A Model for Automatic Partial Evaluation of SQL Queries. In 2020 2nd International Conference on Advanced Information and Communication Technology (ICAICT). IEEE, 240–245. https://doi.org/10.1109/ICAICT51780.2020.9333475
- Julia Coleman Prior and Raymond Lister. 2004. The backwash effect on SQL skills grading. (2004), 32–36. https://doi.org/10.1145/1007996.1008008
- A Web-based tool for teaching and learning SQL. In International Conference on Information Technology Based Higher Education and Training, ITHET.
- Ivan Stajduhar and Goran Mausa. 2015. Using string similarity metrics for automated grading of SQL statements. In 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015, Opatija, Croatia, May 25-29, 2015, Petar Biljanovic, Zeljko Butkovic, Karolj Skala, Branko Mikac, Marina Cicin-Sain, Vlado Sruk, Slobodan Ribaric, Stjepan Gros, Boris Vrdoljak, Mladen Mauher, and Andrej Sokolic (Eds.). IEEE, 1250–1255. https://doi.org/10.1109/MIPRO.2015.7160467
- Query-Driven Knowledge-Sharing for Data Integration and Collaborative Data Science. In New Trends in Databases and Information Systems - ADBIS 2017 Short Papers and Workshops, AMSD, BigNovelTI, DAS, SW4CH, DC, Nicosia, Cyprus, September 24-27, 2017, Proceedings (Communications in Computer and Information Science, Vol. 767), Marite Kirikova, Kjetil Nørvåg, George A. Papadopoulos, Johann Gamper, Robert Wrembel, Jérôme Darmont, and Stefano Rizzi (Eds.). Springer, 63–72. https://doi.org/10.1007/978-3-319-67162-8_8
- Combining Dynamic and Static Analysis for Automated Grading SQL Statements. Journal of Network Intelligence 5, 4 (2020), 179–190.
- Automated Verification of Query Equivalence Using Satisfiability Modulo Theories. Proc. VLDB Endow. 12, 11 (2019), 1276–1288. https://doi.org/10.14778/3342263.3342267
- Leo Köberlein (1 paper)
- Dominik Probst (2 papers)
- Richard Lenz (6 papers)