Semantic Parsing for Complex Data Retrieval: Targeting Query Plans vs. SQL for No-Code Access to Relational Databases (2312.14798v1)
Abstract: LLMs have spurred progress in text-to-SQL, the task of generating SQL queries from natural language questions based on a given database schema. Despite the declarative nature of SQL, it continues to be a complex programming language. In this paper, we investigate the potential of an alternative query language with simpler syntax and modular specification of complex queries. The purpose is to create a query language that can be learned more easily by modern neural semantic parsing architectures while also enabling non-programmers to better assess the validity of the query plans produced by an interactive query plan assistant. The proposed alternative query language is called Query Plan Language (QPL). It is designed to be modular and can be translated into a restricted form of SQL Common Table Expressions (CTEs). The aim of QPL is to make complex data retrieval accessible to non-programmers by allowing users to express their questions in natural language while also providing an easier-to-verify target language. The paper demonstrates how neural LLMs can benefit from QPL's modularity to generate complex query plans in a compositional manner. This involves a question decomposition strategy and a planning stage. We conduct experiments on a version of the Spider text-to-SQL dataset that has been converted to QPL. The hierarchical structure of QPL programs enables us to measure query complexity naturally. Based on this assessment, we identify the low accuracy of existing text-to-SQL systems on complex compositional queries. We present ways to address the challenge of complex queries in an iterative, user-controlled manner, using fine-tuned LLMs and a variety of prompting strategies in a compositional manner.
- Recent advances in text-to-SQL: A survey of what we have and what we expect, in: Proceedings of the 29th International Conference on Computational Linguistics, International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 2022, pp. 2166–2187. URL: https://aclanthology.org/2022.coling-1.190.
- Text-to-sql generation for question answering on electronic medical records, in: Proceedings of The Web Conference 2020, WWW ’20, Association for Computing Machinery, New York, NY, USA, 2020, p. 350–361. URL: https://doi.org/10.1145/3366423.3380120. doi:10.1145/3366423.3380120.
- Text-to-SQL in the wild: A naturally-occurring dataset based on stack exchange data, in: Proceedings of the 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021), Association for Computational Linguistics, Online, 2021, pp. 77–87. URL: https://aclanthology.org/2021.nlp4prog-1.9. doi:10.18653/v1/2021.nlp4prog-1.9.
- Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium, 2018.
- Codebert: A pre-trained model for programming and natural languages, ArXiv abs/2002.08155 (2020).
- L. Dong, M. Lapata, Language to logical form with neural attention, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Berlin, Germany, 2016, pp. 33–43. URL: https://aclanthology.org/P16-1004. doi:10.18653/v1/P16-1004.
- Benchmarking meaning representations in neural semantic parsing, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, 2020, pp. 1520–1540. URL: https://aclanthology.org/2020.emnlp-main.118. doi:10.18653/v1/2020.emnlp-main.118.
- A comprehensive exploration on wikisql with table-aware word contextualization, CoRR abs/1902.01069 (2019). URL: http://arxiv.org/abs/1902.01069. arXiv:1902.01069.
- Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing, in: Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 4870–4888. URL: https://aclanthology.org/2020.findings-emnlp.438. doi:10.18653/v1/2020.findings-emnlp.438.
- Exploring the limits of transfer learning with a unified text-to-text transformer, Journal of Machine Learning Research 21 (2020) 1–67. URL: http://jmlr.org/papers/v21/20-074.html.
- PICARD: Parsing incrementally for constrained auto-regressive decoding from language models, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2021, pp. 9895–9901. URL: https://aclanthology.org/2021.emnlp-main.779.
- Representing schema structure with graph neural networks for text-to-SQL parsing, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 4560–4565. URL: https://aclanthology.org/P19-1448. doi:10.18653/v1/P19-1448.
- RAT-SQL: Relation-aware schema encoding and linking for text-to-SQL parsers, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, 2020, pp. 7567–7578. URL: https://aclanthology.org/2020.acl-main.677. doi:10.18653/v1/2020.acl-main.677.
- P. Yin, G. Neubig, A syntactic neural model for general-purpose code generation, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 440–450. URL: https://aclanthology.org/P17-1041. doi:10.18653/v1/P17-1041.
- Towards complex text-to-SQL in cross-domain database with intermediate representation, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 4524–4535. URL: https://aclanthology.org/P19-1444. doi:10.18653/v1/P19-1444.
- Decoupling the skeleton parsing and schema linking for text-to-sql, in: Proceedings of the 37th AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence, Washington DC, 2023. URL: http://arxiv.org/abs/2302.05965. arXiv:2302.05965.
- D. Lee, Clause-wise and recursive decoding for complex and cross-domain text-to-SQL generation, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, 2019, pp. 6045–6051. URL: https://aclanthology.org/D19-1624. doi:10.18653/v1/D19-1624.
- Measuring compositional generalization: A comprehensive method on realistic data, ICLR 2020 abs/1912.09713 (2019). URL: http://arxiv.org/abs/1912.09713. arXiv:1912.09713.
- Compositional generalization and natural language variation: Can a semantic parsing approach handle both?, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, Online, 2021, pp. 922–938. URL: https://aclanthology.org/2021.acl-long.75. doi:10.18653/v1/2021.acl-long.75.
- Measuring and improving compositional generalization in text-to-SQL via component alignment, in: Findings of the Association for Computational Linguistics: NAACL 2022, Association for Computational Linguistics, Seattle, United States, 2022, pp. 831–843. URL: https://aclanthology.org/2022.findings-naacl.62. doi:10.18653/v1/2022.findings-naacl.62.
- J. Herzig, J. Berant, Span-based semantic parsing for compositional generalization, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, Online, 2021, pp. 908–921. URL: https://aclanthology.org/2021.acl-long.74. doi:10.18653/v1/2021.acl-long.74.
- SyntaxSQLNet: Syntax tree networks for complex and cross-domain text-to-SQL task, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium, 2018, pp. 1653–1663. URL: https://aclanthology.org/D18-1193. doi:10.18653/v1/D18-1193.
- Natural SQL: Making SQL easier to infer from natural language specifications, in: Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics, Punta Cana, Dominican Republic, 2021, pp. 2030–2042. URL: https://aclanthology.org/2021.findings-emnlp.174. doi:10.18653/v1/2021.findings-emnlp.174.
- Unsupervised question decomposition for question answering, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, 2020, pp. 8864–8880. URL: https://aclanthology.org/2020.emnlp-main.713. doi:10.18653/v1/2020.emnlp-main.713.
- Decomposing complex questions makes multi-hop QA easier and more interpretable, in: Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics, Punta Cana, Dominican Republic, 2021, pp. 169–180. URL: https://aclanthology.org/2021.findings-emnlp.17. doi:10.18653/v1/2021.findings-emnlp.17.
- I. Saparina, A. Osokin, SPARQLing database queries from intermediate question decompositions, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2021, pp. 8984–8998. URL: https://aclanthology.org/2021.emnlp-main.708. doi:10.18653/v1/2021.emnlp-main.708.
- Weakly supervised text-to-SQL parsing through question decomposition, in: Findings of the Association for Computational Linguistics: NAACL 2022, Association for Computational Linguistics, Seattle, United States, 2022, pp. 2528–2542. URL: https://aclanthology.org/2022.findings-naacl.193. doi:10.18653/v1/2022.findings-naacl.193.
- SEQZERO: Few-shot compositional semantic parsing with sequential prompts and zero-shot models, in: Findings of the Association for Computational Linguistics: NAACL 2022, Association for Computational Linguistics, Seattle, United States, 2022, pp. 49–60. URL: https://aclanthology.org/2022.findings-naacl.5. doi:10.18653/v1/2022.findings-naacl.5.
- Interpretable amr-based question decomposition for multi-hop question answering, in: L. D. Raedt (Ed.), Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, International Joint Conferences on Artificial Intelligence Organization, 2022, pp. 4093–4099. URL: https://doi.org/10.24963/ijcai.2022/568. doi:10.24963/ijcai.2022/568, main Track.
- Compositional task-oriented parsing as abstractive question answering, in: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Seattle, United States, 2022, pp. 4418–4427. URL: https://aclanthology.org/2022.naacl-main.328. doi:10.18653/v1/2022.naacl-main.328.
- Bridging the Gap between Synthetic and Natural Questions via Sentence Decomposition for Semantic Parsing, Transactions of the Association for Computational Linguistics 11 (2023) 367–383. URL: https://doi.org/10.1162/tacl_a_00552. doi:10.1162/tacl_a_00552. arXiv:https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00552/2087854/tacl_a_00552.pdf.
- Chain-of-thought prompting elicits reasoning in large language models, in: S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh (Eds.), Advances in Neural Information Processing Systems, volume 35, Curran Associates, Inc., 2022, pp. 24824–24837. URL: https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf.
- Access path selection in a relational database management system, in: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, SIGMOD ’79, Association for Computing Machinery, New York, NY, USA, 1979, p. 23–34. URL: https://doi.org/10.1145/582095.582099. doi:10.1145/582095.582099.
- Break It Down: A Question Understanding Benchmark, Transactions of the Association for Computational Linguistics 8 (2020) 183–198. URL: https://doi.org/10.1162/tacl_a_00309. doi:10.1162/tacl_a_00309. arXiv:https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00309/1923705/tacl_a_00309.pdf.
- Ben Eyal (3 papers)
- Amir Bachar (2 papers)
- Ophir Haroche (2 papers)
- Michael Elhadad (11 papers)