Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context (2403.02177v2)

Published 4 Mar 2024 in cs.CL

Abstract: Tables play a crucial role in conveying information in various domains. We propose a Plan-then-Reason framework to answer different types of user queries over tables with sentence context. The framework first plans the reasoning paths over the context, then assigns each step to program-based or textual reasoning to reach the final answer. This framework enhances the table reasoning abilities for both in-context learning and fine-tuning methods. GPT-3.5-Turbo following Plan-then-Reason framework surpasses other prompting baselines without self-consistency while using less API calls and in-context demonstrations. We also construct an instruction tuning set TrixInstruct to evaluate the effectiveness of fine-tuning with this framework. We present ProTrix model family by finetuning models on TrixInstruct. Our experiments show that ProTrix family generalizes to diverse unseen tabular tasks with only 6k training instances. We further demonstrate that ProTrix can generate accurate and faithful explanations to answer complex free-form questions. Our work underscores the importance of the planning and reasoning abilities towards a model over tabular tasks with generalizability and interpretability. We open-source our dataset and models at https://github.com/WilliamZR/ProTrix.

ProTrix: Enhancing Planning and Reasoning Abilities for Querying Over Tables with Contextual Sentence Information

Introduction to the Challenge

Tables are foundational structures for data representation, playing a pivotal role across numerous domains by efficiently encapsulating complex information. However, unlocking the insights they hold, especially when contextual sentences accompany them, can be a daunting task. This necessitates leveraging advanced LLMs capable of not just understanding but also reasoning over such mixed data formats to answer user queries effectively.

The Plan-then-Reason Framework

The core proposal of the discussed paper revolves around a novel Plan-then-Reason framework. This two-fold approach first outlines a strategic pathway to reason, dividing the task into components that can be addressed through either program-based reasoning or textual reasoning. Notably, the framework accommodates for scenarios where a direct programmatic solution is infeasible, blending in contextual sentence information to bridge data gaps.

Planning: Here, the model embarks on a preliminary analysis to identify gaps between the user's query and available context, employing common knowledge or specific insights to map out an actionable plan. It delineates when and how to pivot between programmatic queries and textual analysis to gather the necessary information.

Reasoning: Following the blueprint laid out in the planning stage, the model executes the plan. It takes a dual approach, leveraging SQL queries for direct table manipulation and natural language processing for extracting and integrating nuanced information from sentences.

TrixInstruct: A Dataset for Instruction Tuning

To practically implement and fine-tune models within this framework, the creation of an instruction tuning set dubbed TrixInstruct is pivotal. Distinct for incorporating queries that defy purely programmatic solutions and necessitate the amalgamation of table and sentence data, this dataset is instrumental in teaching models to inherit the requisite planning and reasoning capabilities. The paper benchmarks these abilities across several datasets, demonstrating the potent adaptability and interpretability of models trained via this instruction set.

ProTrix: The Model

The embodiment of this framework, ProTrix, showcases remarkable generalization across a spectrum of tabular tasks, outperforming or matching up with the capabilities of much larger models such as GPT-3.5-turbo while utilizing substantially less data for training. Its performance not only underlines the model's adeptness at generating accurate and faithful answers to complex queries but also underscores the efficiency of the Plan-then-Reason framework in enhancing model interpretability.

Empirical Validation and Insights

Experimental evaluations paint a comprehensive picture of ProTrix's capabilities. With an array of benchmarks spanning short-form question answering, fact verification, and free-form question answering tasks, the model's performance is rigorously tested. Notably, ProTrix demonstrates a significant edge in scenarios demanding a blend of tabular and textual data processing, numerical reasoning, and multi-hop reasoning. Such results solidify the framework's effectiveness in fostering models that can navigate the intricacies of mixed data formats seamlessly.

An ablation paper further clarifies the contribution of both planning and reasoning components to the model's success, emphasizing their synergistic value. Additionally, specific analysis into program-unsolvable queries showcases ProTrix's superiority in leveraging common sense and conceptual understanding to fill informational voids left by tables.

Future Projections and Considerations

Despite its strengths, ProTrix currently faces limitations, particularly around handling complex tables with hierarchical headers and queries spanning multiple tables. Addressing these challenges will be pivotal in broadening the model's applicability and enhancing its real-world utility. Moreover, refining the evaluation metrics to better align with the nuanced outputs generated by such advanced models remains an area ripe for exploration.

Concluding Remarks

ProTrix and the underlying Plan-then-Reason framework mark a significant stride toward equipping LLMs with the nuanced capabilities needed to tackle the multifaceted challenge of reasoning over tables with accompanying sentence context. By marrying programmatic precision with the depth of textual analysis, this approach not only elevates the model's performance across diverse querying tasks but also advances the interpretability and adaptability of LLMs in handling complex, real-world data structures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Feverous: Fact extraction and verification over unstructured and structured information. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
  3. Wenhu Chen. 2023. Large language models are few (1)-shot table reasoners. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1090–1100.
  4. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588.
  5. Tabfact: A large-scale dataset for table-based fact verification. arXiv preprint arXiv:1909.02164.
  6. Hybridqa: A dataset of multi-hop question answering over tabular and textual data. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1026–1036.
  7. Hitab: A hierarchical table dataset for question answering and natural language generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1094–1110.
  8. Binding language models in symbolic languages. In The Eleventh International Conference on Learning Representations.
  9. Turl: Table understanding through representation learning. ACM SIGMOD Record, 51(1):33–40.
  10. Pasta: table-operations aware fact verification via sentence-table cloze pre-training. arXiv preprint arXiv:2211.02816.
  11. Reasoning with language model is planning with world model. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8154–8173, Singapore. Association for Computational Linguistics.
  12. Dual-channel evidence fusion for fact verification over texts and tables. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5232–5242.
  13. Unifee: Unified evidence extraction for fact verification. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1142–1152.
  14. Tabbie: Pretrained representations of tabular data. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
  15. StructGPT: A general framework for large language model to reason over structured data. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9237–9251, Singapore. Association for Computational Linguistics.
  16. Omnitab: Pretraining with natural and synthetic data for few-shot table-based question answering. arXiv preprint arXiv:2207.03637.
  17. S3HQA: A three-stage approach for multi-hop text-table hybrid question answering. arXiv preprint arXiv:2305.11725.
  18. Table-gpt: Table-tuned gpt for diverse table tasks. arXiv preprint arXiv:2310.09263.
  19. Tapex: Table pre-training via learning a neural sql executor. In International Conference on Learning Representations.
  20. From zero to hero: Examining the power of symbolic tasks in instruction tuning. arXiv preprint arXiv:2304.07995.
  21. Rethinking tabular data understanding with large language models. arXiv preprint arXiv:2312.16702.
  22. SCITAB: A challenging benchmark for compositional reasoning and claim verification on scientific tables. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7787–7813, Singapore. Association for Computational Linguistics.
  23. Fetaqa: Free-form table question answering. Transactions of the Association for Computational Linguistics, 10:35–49.
  24. Lever: Learning to verify language-to-code generation with execution. In International Conference on Machine Learning, pages 26106–26128. PMLR.
  25. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  26. Panupong Pasupat and Percy Liang. 2015. Compositional semantic parsing on semi-structured tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1470–1480, Beijing, China. Association for Computational Linguistics.
  27. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3505–3506.
  28. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
  29. On the potential of lexico-logical alignments for semantic parsing to SQL queries. In Findings of EMNLP.
  30. Apollo: An optimized training approach for long-form numerical reasoning. arXiv preprint arXiv:2212.07249.
  31. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  32. Tuta: Tree-based transformers for generally structured table pre-training. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 1780–1790.
  33. Chain-of-table: Evolving tables in the reasoning chain for table understanding. arXiv preprint arXiv:2401.04398.
  34. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  35. Enhancing structured evidence extraction for fact verification. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6631–6641.
  36. Unifiedskg: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. arXiv preprint arXiv:2201.05966.
  37. Tableformer: Robust transformer modeling for table-text encoding. arXiv preprint arXiv:2203.00274.
  38. Large language models are versatile decomposers: Decomposing evidence and questions for table-based reasoning. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 174–184.
  39. Tabert: Pretraining for joint understanding of textual and tabular data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8413–8426.
  40. Tablellama: Towards open large generalist models for tables.
  41. Reactable: Enhancing react for table question answering. arXiv preprint arXiv:2310.00815.
  42. Chain-of-thought reasoning in tabular language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 11006–11019, Singapore. Association for Computational Linguistics.
  43. Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103.
  44. TaCube: Pre-computing data cubes for answering numerical-reasoning questions over tabular data. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2278–2291, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  45. Tat-qa: A question answering benchmark on a hybrid of tabular and textual content in finance. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3277–3287.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Zirui Wu (13 papers)
  2. Yansong Feng (81 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com