Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations (2305.07372v5)
Abstract: Relational databases play an important role in business, science, and more. However, many users cannot fully unleash the analytical power of relational databases, because they are not familiar with database languages such as SQL. Many techniques have been proposed to automatically generate SQL from natural language, but they suffer from two issues: (1) they still make many mistakes, particularly for complex queries, and (2) they do not provide a flexible way for non-expert users to validate and refine incorrect queries. To address these issues, we introduce a new interaction mechanism that allows users to directly edit a step-by-step explanation of a query to fix errors. Our experiments on multiple datasets, as well as a user study with 24 participants, demonstrate that our approach can achieve better performance than multiple SOTA approaches. Our code and datasets are available at https://github.com/magic-YuanTian/STEPS.
- Yitao Cai and Xiaojun Wan. 2020. IGSQL: Database schema interaction graph based neural model for context-dependent text-to-SQL generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6903–6912, Online. Association for Computational Linguistics.
- Speak to your parser: Interactive text-to-SQL with natural language feedback. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2065–2077, Online. Association for Computational Linguistics.
- NL-EDIT: Correcting semantic parse errors through natural language interaction. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5599–5610, Online. Association for Computational Linguistics.
- Towards robustness of text-to-SQL models against synonym substitution. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2505–2515, Online. Association for Computational Linguistics.
- Exploring underexplored limitations of cross-domain text-to-SQL generalization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8926–8931, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Text-to-sql empowered by large language models: A benchmark evaluation. arXiv.
- Alessandra Giordani and Alessandro Moschitti. 2012. Translating questions to SQL queries with generative parsers discriminatively reranked. In Proceedings of COLING 2012: Posters, pages 401–410, Mumbai, India. The COLING 2012 Organizing Committee.
- Chase: A large-scale and pragmatic Chinese dataset for cross-database context-dependent text-to-SQL. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2316–2331, Online. Association for Computational Linguistics.
- DialSQL: Dialogue based structured query generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1339–1349, Melbourne, Australia. Association for Computational Linguistics.
- Developing a natural language interface to complex data. ACM Trans. Database Syst., 3(2):105–147.
- Dynamic hybrid relation network for cross-domain context-dependent semantic parsing. CoRR, abs/2101.01686.
- A comprehensive exploration on wikisql with table-aware word contextualization. In KR2ML Workshop at NeurIPS.
- Learning a neural semantic parser from user feedback. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 963–973, Vancouver, Canada. Association for Computational Linguistics.
- Logos: A system for translating queries into narratives. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ’12, page 673–676, New York, NY, USA. Association for Computing Machinery.
- Explaining structured queries in natural language. In 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pages 333–344.
- Fei Li and H. V. Jagadish. 2014a. Constructing an interactive natural language interface for relational databases. Proc. VLDB Endow., 8(1):73–84.
- Fei Li and Hosagrahar V Jagadish. 2014b. Nalir: An interactive natural language interface for querying relational databases. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14, page 709–712, New York, NY, USA. Association for Computing Machinery.
- RESDSQL: Decoupling schema linking and skeleton parsing for text-to-sql. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence, AAAI’23/IAAI’23/EAAI’23. AAAI Press.
- Graphix-t5: Mixing pre-trained transformers with graph-aware layers for text-to-sql parsing. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence, AAAI’23/IAAI’23/EAAI’23. AAAI Press.
- “what do you mean by that?” a parser-independent interactive approach for enhancing text-to-SQL. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6913–6922, Online. Association for Computational Linguistics.
- Towards transparent interactive semantic parsing via step-by-step correction. In Findings of the Association for Computational Linguistics: ACL 2022, pages 322–342, Dublin, Ireland. Association for Computational Linguistics.
- Diy: Assessing the correctness of natural language to sql systems. In 26th International Conference on Intelligent User Interfaces, IUI ’21, page 597–607, New York, NY, USA. Association for Computing Machinery.
- Saul B. Needleman and Christian D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443–453.
- Sparql2nl: Verbalizing sparql queries. In Proceedings of the 22nd International Conference on World Wide Web, WWW ’13 Companion, page 329–332, New York, NY, USA. Association for Computing Machinery.
- An empirical study of model errors and user error discovery and repair strategies in natural language database queries. In Proceedings of the 28th International Conference on Intelligent User Interfaces, IUI ’23, page 633–649, New York, NY, USA. Association for Computing Machinery.
- Modern natural language interfaces to databases: Composing statistical parsing with semantic tractability. In COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, pages 141–147, Geneva, Switzerland. COLING.
- Ohad Rubin and Jonathan Berant. 2021. SmBoP: Semi-autoregressive bottom-up semantic parsing. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 311–324, Online. Association for Computational Linguistics.
- PICARD: Parsing incrementally for constrained auto-regressive decoding from language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9895–9901, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Alkis Simitsis and Yannis Ioannidis. 2009. Dbmss should talk back too. In 10.48550/ARXIV.0909.1786. arXiv.
- Building natural language interfaces to web apis. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM ’17, page 177–186, New York, NY, USA. Association for Computing Machinery.
- RAT-SQL: Relation-aware schema encoding and linking for text-to-SQL parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7567–7578, Online. Association for Computational Linguistics.
- Tracking interaction states for multi-turn text-to-sql semantic parsing. CoRR, abs/2012.04995.
- An interactive nl2sql approach with reuse strategy. In Database Systems for Advanced Applications: 26th International Conference, DASFAA 2021, Taipei, Taiwan, April 11–14, 2021, Proceedings, Part II, page 280–288, Berlin, Heidelberg. Springer-Verlag.
- Model-based interactive semantic parsing: A unified framework and a text-to-SQL case study. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5447–5458, Hong Kong, China. Association for Computational Linguistics.
- Grappa: Grammar-augmented pre-training for table semantic parsing. CoRR, abs/2009.13845.
- CoSQL: A conversational text-to-SQL challenge towards cross-domain natural language interfaces to databases. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1962–1979, Hong Kong, China. Association for Computational Linguistics.
- Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921, Brussels, Belgium. Association for Computational Linguistics.
- SParC: Cross-domain semantic parsing in context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4511–4523, Florence, Italy. Association for Computational Linguistics.
- Editing-based SQL query generation for cross-domain context-dependent questions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5338–5349, Hong Kong, China. Association for Computational Linguistics.
- Importance of synthesizing high-quality data for text-to-sql parsing.
- Seq2sql: Generating structured queries from natural language using reinforcement learning. In arxiv preprint, arxiv/1709.00103. arXiv.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.