Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Augment before You Try: Knowledge-Enhanced Table Question Answering via Table Expansion (2401.15555v1)

Published 28 Jan 2024 in cs.CL

Abstract: Table question answering is a popular task that assesses a model's ability to understand and interact with structured data. However, the given table often does not contain sufficient information for answering the question, necessitating the integration of external knowledge. Existing methods either convert both the table and external knowledge into text, which neglects the structured nature of the table; or they embed queries for external sources in the interaction with the table, which complicates the process. In this paper, we propose a simple yet effective method to integrate external information in a given table. Our method first constructs an augmenting table containing the missing information and then generates a SQL query over the two tables to answer the question. Experiments show that our method outperforms strong baselines on three table QA benchmarks. Our code is publicly available at https://github.com/UCSB-NLP-Chang/Augment_tableQA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. The fact extraction and VERification over unstructured and structured information (FEVEROUS) shared task. In Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER).
  2. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1533–1544.
  3. Wenhu Chen. 2023. Large language models are few(1)-shot table reasoners. In Findings of the Association for Computational Linguistics: EACL 2023.
  4. Logical natural language generation from open-domain tables. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
  5. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. Transactions on Machine Learning Research.
  6. Tabfact: A large-scale dataset for table-based fact verification. In International Conference on Learning Representations.
  7. HybridQA: A dataset of multi-hop question answering over tabular and textual data. In Findings of the Association for Computational Linguistics: EMNLP 2020.
  8. FinQA: A dataset of numerical reasoning over financial data. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
  9. Binding language models in symbolic languages. In The Eleventh International Conference on Learning Representations.
  10. Turl: Table understanding through representation learning.
  11. TaPas: Weakly supervised table parsing via pre-training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
  12. TAPEX: Table pre-training via learning a neural SQL executor. In International Conference on Learning Representations.
  13. FeTaQA: Free-form table question answering. Transactions of the Association for Computational Linguistics.
  14. DART: Open-domain structured data record to text generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
  15. ToTTo: A controlled table-to-text generation dataset. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).
  16. Panupong Pasupat and Percy Liang. 2015. Compositional semantic parsing on semi-structured tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers).
  17. Mohammadreza Pourreza and Davood Rafiei. 2023. Din-sql: Decomposed in-context learning of text-to-sql with self-correction. arXiv preprint arXiv: 2304.11015.
  18. Evaluating the text-to-sql capabilities of large language models.
  19. Compositional generalization and natural language variation: Can a semantic parsing approach handle both? arXiv preprint arXiv:2010.12725.
  20. On the potential of lexico-logical alignments for semantic parsing to SQL queries. In Findings of the Association for Computational Linguistics: EMNLP 2020.
  21. Llama 2: Open foundation and fine-tuned chat models.
  22. Chain-of-table: Evolving tables in the reasoning chain for table understanding.
  23. Chain of thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
  24. UnifiedSKG: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.
  25. TableFormer: Robust transformer modeling for table-text encoding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
  26. Large language models are versatile decomposers: Decompose evidence and questions for table-based reasoning.
  27. Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. arXiv preprint arXiv:1704.01696.
  28. TaBERT: Pretraining for joint understanding of textual and tabular data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
  29. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.
  30. Luke S. Zettlemoyer and Michael Collins. 2012. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars.
  31. Reastap: Injecting table reasoning skills during pre-training via synthetic reasoning examples. Conference on Empirical Methods in Natural Language Processing.
  32. Seq2sql: Generating structured queries from natural language using reinforcement learning.
  33. TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yujian Liu (15 papers)
  2. Jiabao Ji (13 papers)
  3. Tong Yu (119 papers)
  4. Ryan Rossi (67 papers)
  5. Sungchul Kim (65 papers)
  6. Handong Zhao (38 papers)
  7. Ritwik Sinha (17 papers)
  8. Yang Zhang (1129 papers)
  9. Shiyu Chang (120 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com