Enhancing Open-Domain Table Question Answering via Syntax- and Structure-aware Dense Retrieval (2309.10506v1)
Abstract: Open-domain table question answering aims to provide answers to a question by retrieving and extracting information from a large collection of tables. Existing studies of open-domain table QA either directly adopt text retrieval methods or consider the table structure only in the encoding layer for table retrieval, which may cause syntactical and structural information loss during table scoring. To address this issue, we propose a syntax- and structure-aware retrieval method for the open-domain table QA task. It provides syntactical representations for the question and uses the structural header and value representations for the tables to avoid the loss of fine-grained syntactical and structural information. Then, a syntactical-to-structural aggregator is used to obtain the matching score between the question and a candidate table by mimicking the human retrieval process. Experimental results show that our method achieves the state-of-the-art on the NQ-tables dataset and overwhelms strong baselines on a newly curated open-domain Text-to-SQL dataset.
- Natural Language Processing with Python. O’Reilly Media Inc.
- Bridge the gap between language models and tabular understanding. CoRR, abs/2302.09302.
- Retrieval augmented via execution guidance in open-domain table qa. Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence.
- Open question answering over tables and text. In 9th International Conference on Learning Representations, Virtual Event, Austria. OpenReview.net.
- Table search using a deep contextualized language model. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020, pages 589–598. ACM.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, pages 4171–4186, Minneapolis, MN, USA. Association for Computational Linguistics.
- MATE: multi-view attention for table transformer efficiency. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 7606–7619. Association for Computational Linguistics.
- Learning dense representations for entity retrieval. In Proceedings of the 23rd Conference on Computational Natural Language Learning, pages 528–537, Hong Kong, China. Association for Computational Linguistics.
- End-to-end retrieval in continuous space. CoRR, abs/1811.08008.
- Open domain question answering over tables via dense retrieval. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 512–519, Online. Association for Computational Linguistics.
- Tapas: Weakly supervised table parsing via pre-training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4320–4333, Online. Association for Computational Linguistics.
- Mixed-modality representation learning and pre-training for joint table-and-text retrieval in openqa. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4117–4129, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. In 8th International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia. OpenReview.net.
- Gooaq: Open question answering with diverse answer types. In Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021, pages 421–433. Association for Computational Linguistics.
- Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over BERT. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48, Virtual Event, China. ACM.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA.
- Multi-modal retrieval of tables and texts using tri-encoder models. CoRR, abs/2108.04049.
- Open-wikitable: Dataset for open domain question answering with complex reasoning over table. CoRR, abs/2305.07288.
- Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguistics, 7:452–466.
- Dual reader-parser on hybrid textual and tabular evidence for open domain question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP) (Volume 1: Long Papers), pages 4078–4088, Virtual Event. Association for Computational Linguistics.
- Sparse, dense, and attentional representations for text retrieval. Trans. Assoc. Comput. Linguistics, 9:329–345.
- Sparse, dense, and attentional representations for text retrieval. Transactions of the Association for Computational Linguistics, 9:329–345.
- Hybrid ranking network for text-to-sql. CoRR, abs/2008.04759.
- Answering conversational questions on structured data without logical forms. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 5901–5909. Association for Computational Linguistics.
- FeTaQA: Free-form table question answering. Transactions of the Association for Computational Linguistics, 10:35–49.
- Enhancing financial table and text question answering with tabular graph and numerical reasoning. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, AACL/IJCNLP 2022 - Volume 1: Long Papers, Online Only, November 20-23, 2022, pages 991–1000. Association for Computational Linguistics.
- Unik-qa: Unified representations of structured and unstructured knowledge for open-domain question answering. In Findings of the Association for Computational Linguistics (NAACL), pages 1535–1546, Seattle, WA, United States. Association for Computational Linguistics.
- Panupong Pasupat and Percy Liang. 2015. Compositional semantic parsing on semi-structured tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (Volume 1: Long Papers), pages 1470–1480, Beijing, China. The Association for Computer Linguistics.
- Stephen E. Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval., pages 232–241, Dublin, Ireland. ACM/Springer.
- Content-based table retrieval for web queries. Neurocomputing, 349:183–189.
- Improving document representations by generating pseudo query embeddings for dense retrieval. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP) (Volume 1: Long Papers), pages 5054–5064, Virtual Event. Association for Computational Linguistics.
- Strubert: Structure-aware BERT for table search and matching. In WWW ’22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25 - 29, 2022, pages 442–451. ACM.
- RAT-SQL: relation-aware schema encoding and linking for text-to-sql parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pages 7567–7578. Association for Computational Linguistics.
- Retrieving complex tables with multi-granular graph representation learning. In SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021, pages 1472–1482. ACM.
- Pengcheng Yin and Graham Neubig. 2018. TRANX: A transition-based neural abstract syntax parser for semantic parsing and code generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018: System Demonstrations, Brussels, Belgium, October 31 - November 4, 2018, pages 7–12. Association for Computational Linguistics.
- Tabert: Pretraining for joint understanding of textual and tabular data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pages 8413–8426. Association for Computational Linguistics.
- Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921, Brussels, Belgium. Association for Computational Linguistics.
- Shuo Zhang and Krisztian Balog. 2018. Ad hoc table retrieval using semantic similarity. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018, pages 1553–1562. ACM.
- Seq2sql: Generating structured queries from natural language using reinforcement learning. CoRR, abs/1709.00103.
- Reasoning over hybrid chain for table-and-text open domain QA. CoRR, abs/2201.05880.
- TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 3277–3287. Association for Computational Linguistics.
- Nengzheng Jin (2 papers)
- Dongfang Li (46 papers)
- Junying Chen (26 papers)
- Joanna Siebert (5 papers)
- Qingcai Chen (36 papers)