TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data (2401.13223v3)
Abstract: In this work, we address question answering (QA) over a hybrid of tabular and textual data that are very common content on the Web (e.g. SEC filings), where discrete reasoning capabilities are often required. Recently, LLMs like GPT-4 have demonstrated strong multi-step reasoning capabilities. We then consider harnessing the amazing power of LLMs to solve our task. We abstract a Step-wise Pipeline for tabular and textual QA, which consists of three key steps, including Extractor, Reasoner and Executor, and initially design an instruction to instantiate the pipeline and validate that GPT-4 outperforms all existing methods. However, utilizing an online LLM like GPT-4 holds various challenges in terms of cost, latency, and data security risk, which motivates us to specialize smaller LLMs in this task. We develop a TAT-LLM LLM by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets following the Step-wise Pipeline. The experimental results have verified that our TAT-LLM model can outperform all baseline models, including the previous best fine-tuned models and very large-scale LLMs like GPT-4 on FinQA, TAT-QA and TAT-DQA benchmarks.
- Longformer: The Long-Document Transformer. arXiv:2004.05150 (2020).
- Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). 1877–1901.
- Wenhu Chen. 2023. Large Language Models are few (1)-shot Table Reasoners. In Findings of the Association for Computational Linguistics: EACL 2023. 1090–1100.
- Open Question Answering over Tables and Text. In International Conference on Learning Representations.
- HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data. In Findings of the Association for Computational Linguistics: EMNLP 2020. 1026–1036.
- Neural symbolic reader: Scalable integration of distributed and symbolic representations for reading comprehension. In International Conference on Learning Representations.
- FinQA: A Dataset of Numerical Reasoning over Financial Data. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3697–3711.
- Binding Language Models in Symbolic Languages. In The Eleventh International Conference on Learning Representations.
- Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https://lmsys.org/blog/2023-03-30-vicuna/
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168 (2021).
- Training Verifiers to Solve Math Word Problems. arXiv preprint arXiv:2110.14168 (2021).
- PACIFIC: Towards Proactive Conversational Question Answering over Tabular and Textual Data in Finance. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 6970–6984. https://aclanthology.org/2022.emnlp-main.469
- Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314 (2023).
- Specializing Smaller Language Models towards Multi-Step Reasoning. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202). PMLR, 10421–10430.
- Carlos Gemmell and Jeffrey Dalton. 2023. Generate, Transform, Answer: Question Specific Tool Synthesis for Tabular Data. arXiv preprint arXiv:2303.10138 (2023).
- Measuring Mathematical Problem Solving With the MATH Dataset. NeurIPS (2021).
- LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations. https://openreview.net/forum?id=nZeVKeeFYf9
- StructGPT: A general framework for Large Language Model to Reason on Structured Data. arXiv preprint arXiv:2305.09645. https://arxiv.org/pdf/2305.09645
- Answering Numerical Reasoning Questions in Table-Text Hybrid Contents with Graph-based Encoder and Tree-based Decoder. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 1379–1390. https://aclanthology.org/2022.coling-1.118
- FinMath: Injecting a Tree-structured Solver for Question Answering over Financial Reports. In Proceedings of the Thirteenth Language Resources and Evaluation Conference. 6147–6152.
- Learning to Imagine: Integrating Counterfactual Thinking in Neural Discrete Reasoning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 57–69.
- TSQA: tabular scenario based question answering. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 13297–13305.
- DyRRen: A Dynamic Retriever-Reranker-Generator Model for Numerical Reasoning over Tabular and Textual Data. AAAI Press, Article 1474, 9 pages.
- Tiedong Liu and Bryan Kian Hsiang Low. 2023. Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks. arXiv preprint arXiv:2305.14201 (2023).
- MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images. arXiv preprint arXiv:2309.04790 (2023).
- WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct. arXiv:2308.09583 [cs.CL]
- Enhancing Financial Table and Text Question Answering with Tabular Graph and Numerical Reasoning. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 991–1000.
- KIQA: Knowledge-Infused Question Answering Model for Financial Table-Text Data. In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. Association for Computational Linguistics, Dublin, Ireland and Online, 53–61.
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
- Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 27730–27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf
- NumNet: Machine Reading Comprehension with Numerical Reasoning. In EMNLP-IJCNLP. 2474–2484.
- QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension. ACM Comput. Surv. 55, 10, Article 197 (feb 2023), 45 pages. https://doi.org/10.1145/3560260
- Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca.
- LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971 [cs.CL]
- Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]
- Teaching Broad Reasoning Skills for Multi-Step QA by Generating Hard Contexts. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 6541–6566. https://aclanthology.org/2022.emnlp-main.439
- Finetuned Language Models are Zero-Shot Learners. In International Conference on Learning Representations. https://openreview.net/forum?id=gEZrGCozdqR
- Emergent Abilities of Large Language Models. Transactions on Machine Learning Research (2022).
- Chain of Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openreview.net/forum?id=_VjQlMeSB_J
- Multi-View Graph Representation Learning for Answering Hybrid Numerical Reasoning Question. arXiv:2305.03458 [cs.CL]
- Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564 (2023).
- Effective Distillation of Table-based Reasoning Ability from LLMs. arXiv preprint arXiv:2309.13182 (2023).
- Large Language Models Are Versatile Decomposers: Decomposing Evidence and Questions for Table-Based Reasoning. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 174–184. https://doi.org/10.1145/3539618.3591708
- MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning. arXiv:2309.05653 [cs.CL]
- Jiaxin Zhang and Yashar Moshfeghi. 2022. ELASTIC: Numerical Reasoning with Adaptive Symbolic Compiler. In Advances in Neural Information Processing Systems, S Koyejo, S Mohamed, A Agarwal, D Belgrave, K Cho, and A Oh (Eds.), Vol. 35. Curran Associates, Inc., 12647–12661.
- Instruction tuning for large language models: A survey. arXiv preprint arXiv:2308.10792 (2023).
- MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 6588–6600.
- Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=WZH7099tgfM
- UniRPG: Unified Discrete Reasoning over Table and Text as Program Generation. arXiv preprint arXiv:2210.08249 (2022).
- Towards complex document understanding by discrete reasoning. In Proceedings of the 30th ACM International Conference on Multimedia. 4857–4866.
- TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 3277–3287.
- SoarGraph: Numerical Reasoning over Financial Table-Text Data via Semantic-Oriented Hierarchical Graphs. In Companion Proceedings of the ACM Web Conference 2023. Association for Computing Machinery, 1236–1244.
- Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents with Semantic-Oriented Hierarchical Graphs. arXiv:2305.01938 [cs.CL]
- Fengbin Zhu (19 papers)
- Ziyang Liu (26 papers)
- Fuli Feng (143 papers)
- Chao Wang (555 papers)
- Moxin Li (13 papers)
- Tat-Seng Chua (359 papers)