Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Quantifying Semantic Query Similarity for Automated Linear SQL Grading: A Graph-based Approach (2403.14441v1)

Published 21 Mar 2024 in cs.DB

Abstract: Quantifying the semantic similarity between database queries is a critical challenge with broad applications, ranging from query log analysis to automated educational assessment of SQL skills. Traditional methods often rely solely on syntactic comparisons or are limited to checking for semantic equivalence. This paper introduces a novel graph-based approach to measure the semantic dissimilarity between SQL queries. Queries are represented as nodes in an implicit graph, while the transitions between nodes are called edits, which are weighted by semantic dissimilarity. We employ shortest path algorithms to identify the lowest-cost edit sequence between two given queries, thereby defining a quantifiable measure of semantic distance. A prototype implementation of this technique has been evaluated through an empirical study, which strongly suggests that our method provides more accurate and comprehensible grading compared to existing techniques. Moreover, the results indicate that our approach comes close to the quality of manual grading, making it a robust tool for diverse database query comparison tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Foundations of Databases. Addison-Wesley. http://webdam.inria.fr/Alice/
  2. Equivalences Among Relational Expressions. SIAM J. Comput. 8, 2 (1979), 218–246. https://doi.org/10.1137/0208017
  3. The XDa-TA system for automated grading of SQL query assignments. In 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, Johannes Gehrke, Wolfgang Lehner, Kyuseok Shim, Sang Kyun Cha, and Guy M. Lohman (Eds.). IEEE Computer Society, 1468–1471. https://doi.org/10.1109/ICDE.2015.7113403
  4. Navigating the Maze of Wikidata Query Logs. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.). ACM, 127–138. https://doi.org/10.1145/3308558.3313472
  5. Automated Grading of SQL Queries. In 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8-11, 2019. IEEE, 1630–1633. https://doi.org/10.1109/ICDE.2019.00159
  6. Edit Based Grading of SQL Queries. arXiv:1912.09019 http://arxiv.org/abs/1912.09019
  7. Edit Based Grading of SQL Queries. In CODS-COMAD 2021: 8th ACM IKDD CODS and 26th COMAD, Virtual Event, Bangalore, India, January 2-4, 2021, Jayant R. Haritsa, Shourya Roy, Manish Gupta, Sharad Mehrotra, Balaji Vasan Srinivasan, and Yogesh Simmhan (Eds.). ACM, 56–64. https://doi.org/10.1145/3430984.3431012
  8. Data generation for testing and grading SQL queries. VLDB J. 24, 6 (2015), 731–755. https://doi.org/10.1007/s00778-015-0395-0
  9. Partial Marking for Automated Grading of SQL Queries. Proc. VLDB Endow. 9, 13 (2016), 1541–1544. https://doi.org/10.14778/3007263.3007304
  10. Bikash Chandra and S. Sudarshan. 2022. Automated Grading of SQL Queries. IEEE Data Eng. Bull. 45, 3 (2022), 17–28. http://sites.computer.org/debull/A22sept/p17.pdf
  11. Axiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries. Proc. VLDB Endow. 11, 11 (2018), 1482–1495. https://doi.org/10.14778/3236187.3236200
  12. Cosette: An Automated Prover for SQL. In 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8-11, 2017, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2017/papers/p51-chu-cidr17.pdf
  13. Robert Dollinger and Nathaniel A. Melville. 2011. Semantic evaluation of SQL queries. In IEEE International Conference on Intelligent Computer Communication and Processing, ICCP 2011, Cluj-Napoca, Romania, August 25-27, 2011. IEEE, 57–64. https://doi.org/10.1109/ICCP.2011.6047844
  14. The Effects of Adding Non-Compulsory Exercises to an Online Learning Tool on Student Performance and Code Copying. ACM Trans. Comput. Educ. 19, 3 (2019), 16:1–16:22. https://doi.org/10.1145/3264507
  15. A Novel System for Automatic, Configurable and Partial Assessment of Student SQL Queries. In 43rd International Convention on Information, Communication and Electronic Technology, MIPRO 2020, Opatija, Croatia, September 28 - October 2, 2020, Marko Koricic, Karolj Skala, Zeljka Car, Marina Cicin-Sain, Vlado Sruk, Dejan Skvorc, Slobodan Ribaric, Bojan Jerbic, Stjepan Gros, Boris Vrdoljak, Mladen Mauher, Edvard Tijan, Tihomir Katulic, Predrag Pale, Tihana Galinac Grbac, Nikola Filip Fijan, Adrian Boukalov, Dragan Cisic, and Vera Gradisnik (Eds.). IEEE, 832–837. https://doi.org/10.23919/MIPRO48935.2020.9245264
  16. Ariel Felner. 2011. Position Paper: Dijkstra’s Algorithm versus Uniform Cost Search or a Case Against Dijkstra’s Algorithm. In Proceedings of the Fourth Annual Symposium on Combinatorial Search, SOCS 2011, Castell de Cardona, Barcelona, Spain, July 15.16, 2011, Daniel Borrajo, Maxim Likhachev, and Carlos Linares López (Eds.). AAAI Press. http://www.aaai.org/ocs/index.php/SOCS/SOCS11/paper/view/4017
  17. Automated grading and tutoring of SQL statements to improve student learning. In 13th Koli Calling International Conference on Computing Education Research, Koli Calling ’13, Koli, Finland, November 14-17, 2013, Mikko-Jussi Laakso and Simon (Eds.). ACM, 161–168. https://doi.org/10.1145/2526968.2526986
  18. DBLearn: Adaptive e-learning for practical database course - An integrated architecture approach. In 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2017, Kanazawa, Japan, June 26-28, 2017, Teruhisa Hochin, Hiroaki Hirata, and Hiroki Nomiya (Eds.). IEEE Computer Society, 109–114. https://doi.org/10.1109/SNPD.2017.8022708
  19. Fauhat Ali Khan Panni and Abu Sayed Md. Latiful Hoque. 2020. A Model for Automatic Partial Evaluation of SQL Queries. In 2020 2nd International Conference on Advanced Information and Communication Technology (ICAICT). IEEE, 240–245. https://doi.org/10.1109/ICAICT51780.2020.9333475
  20. Julia Coleman Prior and Raymond Lister. 2004. The backwash effect on SQL skills grading. (2004), 32–36. https://doi.org/10.1145/1007996.1008008
  21. A Web-based tool for teaching and learning SQL. In International Conference on Information Technology Based Higher Education and Training, ITHET.
  22. Ivan Stajduhar and Goran Mausa. 2015. Using string similarity metrics for automated grading of SQL statements. In 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015, Opatija, Croatia, May 25-29, 2015, Petar Biljanovic, Zeljko Butkovic, Karolj Skala, Branko Mikac, Marina Cicin-Sain, Vlado Sruk, Slobodan Ribaric, Stjepan Gros, Boris Vrdoljak, Mladen Mauher, and Andrej Sokolic (Eds.). IEEE, 1250–1255. https://doi.org/10.1109/MIPRO.2015.7160467
  23. Query-Driven Knowledge-Sharing for Data Integration and Collaborative Data Science. In New Trends in Databases and Information Systems - ADBIS 2017 Short Papers and Workshops, AMSD, BigNovelTI, DAS, SW4CH, DC, Nicosia, Cyprus, September 24-27, 2017, Proceedings (Communications in Computer and Information Science, Vol. 767), Marite Kirikova, Kjetil Nørvåg, George A. Papadopoulos, Johann Gamper, Robert Wrembel, Jérôme Darmont, and Stefano Rizzi (Eds.). Springer, 63–72. https://doi.org/10.1007/978-3-319-67162-8_8
  24. Combining Dynamic and Static Analysis for Automated Grading SQL Statements. Journal of Network Intelligence 5, 4 (2020), 179–190.
  25. Automated Verification of Query Equivalence Using Satisfiability Modulo Theories. Proc. VLDB Endow. 12, 11 (2019), 1276–1288. https://doi.org/10.14778/3342263.3342267
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Leo Köberlein (1 paper)
  2. Dominik Probst (2 papers)
  3. Richard Lenz (6 papers)

Summary

We haven't generated a summary for this paper yet.