ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context (2403.02177v2)
Abstract: Tables play a crucial role in conveying information in various domains. We propose a Plan-then-Reason framework to answer different types of user queries over tables with sentence context. The framework first plans the reasoning paths over the context, then assigns each step to program-based or textual reasoning to reach the final answer. This framework enhances the table reasoning abilities for both in-context learning and fine-tuning methods. GPT-3.5-Turbo following Plan-then-Reason framework surpasses other prompting baselines without self-consistency while using less API calls and in-context demonstrations. We also construct an instruction tuning set TrixInstruct to evaluate the effectiveness of fine-tuning with this framework. We present ProTrix model family by finetuning models on TrixInstruct. Our experiments show that ProTrix family generalizes to diverse unseen tabular tasks with only 6k training instances. We further demonstrate that ProTrix can generate accurate and faithful explanations to answer complex free-form questions. Our work underscores the importance of the planning and reasoning abilities towards a model over tabular tasks with generalizability and interpretability. We open-source our dataset and models at https://github.com/WilliamZR/ProTrix.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Feverous: Fact extraction and verification over unstructured and structured information. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
- Wenhu Chen. 2023. Large language models are few (1)-shot table reasoners. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1090–1100.
- Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588.
- Tabfact: A large-scale dataset for table-based fact verification. arXiv preprint arXiv:1909.02164.
- Hybridqa: A dataset of multi-hop question answering over tabular and textual data. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1026–1036.
- Hitab: A hierarchical table dataset for question answering and natural language generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1094–1110.
- Binding language models in symbolic languages. In The Eleventh International Conference on Learning Representations.
- Turl: Table understanding through representation learning. ACM SIGMOD Record, 51(1):33–40.
- Pasta: table-operations aware fact verification via sentence-table cloze pre-training. arXiv preprint arXiv:2211.02816.
- Reasoning with language model is planning with world model. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8154–8173, Singapore. Association for Computational Linguistics.
- Dual-channel evidence fusion for fact verification over texts and tables. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5232–5242.
- Unifee: Unified evidence extraction for fact verification. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1142–1152.
- Tabbie: Pretrained representations of tabular data. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
- StructGPT: A general framework for large language model to reason over structured data. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9237–9251, Singapore. Association for Computational Linguistics.
- Omnitab: Pretraining with natural and synthetic data for few-shot table-based question answering. arXiv preprint arXiv:2207.03637.
- S3HQA: A three-stage approach for multi-hop text-table hybrid question answering. arXiv preprint arXiv:2305.11725.
- Table-gpt: Table-tuned gpt for diverse table tasks. arXiv preprint arXiv:2310.09263.
- Tapex: Table pre-training via learning a neural sql executor. In International Conference on Learning Representations.
- From zero to hero: Examining the power of symbolic tasks in instruction tuning. arXiv preprint arXiv:2304.07995.
- Rethinking tabular data understanding with large language models. arXiv preprint arXiv:2312.16702.
- SCITAB: A challenging benchmark for compositional reasoning and claim verification on scientific tables. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7787–7813, Singapore. Association for Computational Linguistics.
- Fetaqa: Free-form table question answering. Transactions of the Association for Computational Linguistics, 10:35–49.
- Lever: Learning to verify language-to-code generation with execution. In International Conference on Machine Learning, pages 26106–26128. PMLR.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- Panupong Pasupat and Percy Liang. 2015. Compositional semantic parsing on semi-structured tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1470–1480, Beijing, China. Association for Computational Linguistics.
- Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3505–3506.
- Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
- On the potential of lexico-logical alignments for semantic parsing to SQL queries. In Findings of EMNLP.
- Apollo: An optimized training approach for long-form numerical reasoning. arXiv preprint arXiv:2212.07249.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Tuta: Tree-based transformers for generally structured table pre-training. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 1780–1790.
- Chain-of-table: Evolving tables in the reasoning chain for table understanding. arXiv preprint arXiv:2401.04398.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Enhancing structured evidence extraction for fact verification. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6631–6641.
- Unifiedskg: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. arXiv preprint arXiv:2201.05966.
- Tableformer: Robust transformer modeling for table-text encoding. arXiv preprint arXiv:2203.00274.
- Large language models are versatile decomposers: Decomposing evidence and questions for table-based reasoning. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 174–184.
- Tabert: Pretraining for joint understanding of textual and tabular data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8413–8426.
- Tablellama: Towards open large generalist models for tables.
- Reactable: Enhancing react for table question answering. arXiv preprint arXiv:2310.00815.
- Chain-of-thought reasoning in tabular language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 11006–11019, Singapore. Association for Computational Linguistics.
- Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103.
- TaCube: Pre-computing data cubes for answering numerical-reasoning questions over tabular data. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2278–2291, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Tat-qa: A question answering benchmark on a hybrid of tabular and textual content in finance. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3277–3287.
- Zirui Wu (13 papers)
- Yansong Feng (81 papers)