Conditions for Length Generalization in Learning Reasoning Skills (2311.16173v2)
Abstract: Reasoning is a fundamental capability of AI agents. Recently, LLMs have shown remarkable abilities to perform reasoning tasks. However, numerous evaluations of the reasoning capabilities of LLMs have also showed some limitations. An outstanding limitation is length generalization, meaning that when trained on reasoning problems of smaller lengths or sizes, the resulting models struggle with problems of larger sizes or lengths. This potentially indicates some theoretical limitations of generalization in learning reasoning skills. These evaluations and their observations motivated us to perform a theoretical study of the length generalization problem. This work focuses on reasoning tasks that can be formulated as Markov dynamic processes (MDPs) and/or directed acyclic graphs (DAGs). It identifies and proves conditions that decide whether the length generalization problem can be solved or not for a reasoning task in a particular representation. Experiments are also conducted to verify the theoretical results.
- Generalization on the unseen, logic reasoning and degree curriculum. arXiv preprint arXiv:2301.13105, 2023.
- Evaluating large language models with neubaroco: Syllogistic reasoning ability and human-like biases. arXiv preprint arXiv:2306.12567, 2023.
- Konstantine Arkoudas. Gpt-4 can’t reason. arXiv preprint arXiv:2308.03762, 2023.
- Exploring length generalization in large language models. Advances in Neural Information Processing Systems, 35:38546–38556, 2022.
- Digraphs: theory, algorithms and applications. Springer Science & Business Media, 2008.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- When do program-of-thoughts work for reasoning? arXiv preprint arXiv:2308.15452, 2023.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
- A bottom-up dag structure extraction model for math word problems. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 39–46, 2021.
- Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588, 2022.
- Measuring and improving chain-of-thought reasoning in vision-language models. arXiv preprint arXiv:2309.04461, 2023.
- Binding language models in symbolic languages. ICLR-2023, 2023.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Faith and fate: Limits of transformers on compositionality. arXiv preprint arXiv:2305.18654, 2023.
- Towards revealing the mystery behind chain of thought: a theoretical perspective. arXiv preprint arXiv:2305.15408, 2023.
- Chain-of-thought hub: A continuous effort to measure large language models’ reasoning performance. arXiv preprint arXiv:2305.17306, 2023.
- Large language models are not abstract reasoners. arXiv preprint arXiv:2305.19555, 2023.
- Pal: Program-aided language models. In International Conference on Machine Learning, pages 10764–10799. PMLR, 2023.
- Mohamad H Hassoun. Fundamentals of artificial neural networks. MIT press, 1995.
- Simon Haykin. Neural networks: a comprehensive foundation. Prentice Hall PTR, 1998.
- Reasoning with transformer-based models: Deep learning, but shallow reasoning. In International Conference on Automated Knowledge Base Construction (AKBC), 2021.
- Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. Findings of the Association for Computational Linguistics (ACL2023), 2023.
- Tree-of-mixed-thought: Combining fast and slow thinking for multi-hop visual reasoning. arXiv preprint arXiv:2308.09658, 2023.
- Code prompting: a neural symbolic method for complex reasoning in large language models. arXiv preprint arXiv:2305.18507, 2023.
- Directed acyclic transformer for non-autoregressive machine translation. In International Conference on Machine Learning, pages 9410–9428, 2022.
- Deductive verification of chain-of-thought reasoning. arXiv preprint arXiv:2306.03872, 2023.
- Faithful chain-of-thought reasoning. arXiv preprint arXiv:2301.13379, 2023.
- Recursion of thought: A divide-and-conquer approach to multi-context reasoning with language models. arXiv preprint arXiv:2306.06891, 2023.
- Tiedong Liu and Bryan Kian Hsiang Low. Goat: Fine-tuned llama outperforms gpt-4 on arithmetic tasks. arXiv preprint arXiv:2305.14201, 2023.
- Evaluating the logical reasoning ability of chatgpt and gpt-4. arXiv preprint arXiv:2304.03439, 2023.
- Jieyi Long. Large language model guided tree-of-thought. arXiv preprint arXiv:2305.08291, 2023.
- Dissecting chain-of-thought: A study on compositional in-context learning of mlps. arXiv preprint arXiv:2305.18869, 2023.
- Teaching arithmetic to small transformers. arXiv preprint arXiv:2307.03381, 2023.
- Counterfactual reasoning: Testing language models’ understanding of hypothetical scenarios. arXiv preprint arXiv:2305.16572, 2023.
- Eran Malach. Auto-regressive next-token predictors are universal learners. arXiv preprint arXiv:2309.06979, 2023.
- Orca: Progressive learning from complex explanation traces of gpt-4. arXiv preprint arXiv:2306.02707, 2023.
- A symbolic framework for systematic evaluation of mathematical reasoning with transformers. arXiv preprint arXiv:2305.12563, 2023.
- Tree of uncertain thoughts reasoning for large language models, 2023.
- Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021.
- Investigating the limitations of transformers with simple arithmetic tasks. arXiv preprint arXiv:2102.13019, 2021.
- OpenAI. Chatgpt: Optimizing language models for dialogue, 2022.
- OpenAI. Gpt-4 technical report, 2023.
- Certified reasoning with language models. arXiv preprint arXiv:2306.04031, 2023.
- Allan Pinkus. Approximation theory of the mlp model in neural networks. Acta numerica, 8:143–195, 1999.
- Limitations of language models in arithmetic and symbolic induction. arXiv preprint arXiv:2208.05051, 2022.
- The art of socratic questioning: Zero-shot multimodal reasoning with recursive thinking and self-questioning. arXiv preprint arXiv:2305.14999, 2023.
- Understanding arithmetic reasoning in language models using causal mediation analysis. arXiv preprint arXiv:2305.15054, 2023.
- Commonsense reasoning for natural language understanding: A survey of benchmarks, resources, and approaches. arXiv preprint arXiv:1904.01172, pages 1–60, 2019.
- Abulhair Saparov and He He. Language models are greedy reasoners: A systematic formal analysis of chain-of-thought. arXiv preprint arXiv:2210.01240, 2022.
- Chaining simultaneous thoughts for numerical reasoning. arXiv preprint arXiv:2211.16482, 2022.
- Scone: Benchmarking negation reasoning in language models with fine-tuning and in-context learning. arXiv preprint arXiv:2305.19426, 2023.
- Invalid logic, equivalent gains: The bizarreness of reasoning in language model prompting. arXiv preprint arXiv:2307.10573, 2023.
- Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261, 2022.
- Towards benchmarking and improving the temporal reasoning capability of large language models. arXiv preprint arXiv:2306.08952, 2023.
- Large language models are in-context semantic reasoners rather than symbolic reasoners. arXiv preprint arXiv:2305.14825, 2023.
- Exploring equation as a better intermediate meaning representation for numerical reasoning. arXiv preprint arXiv:2308.10585, 2023.
- Learning multi-step reasoning by solving arithmetic tasks. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1229–1238, 2023.
- Making large language models better reasoners with alignment. arXiv preprint arXiv:2309.02144, 2023.
- Sub-task decomposition enables learning in sequence to sequence tasks. Proceddings of International Conference on Learning Representations (ICLR-2023), 2023.
- Towards understanding chain-of-thought prompting: An empirical study of what matters. arXiv preprint arXiv:2212.10001, 2022.
- Reasoning or reciting? exploring the capabilities and limitations of language models through counterfactual tasks. arXiv preprint arXiv:2307.02477, 2023.
- Boosting language models reasoning with chain-of-knowledge prompting. arXiv preprint arXiv:2306.06427, 2023.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Are large language models really good logical reasoners? a comprehensive evaluation from deductive, inductive and abductive views. arXiv preprint arXiv:2306.09841, 2023.
- Rewoo: Decoupling reasoning from observations for efficient augmented language models. arXiv preprint arXiv:2305.18323, 2023.
- Logical reasoning over natural language as knowledge representation: A survey. arXiv preprint arXiv:2303.12023, 2023.
- Beyond chain-of-thought, effective graph-of-thought reasoning in large language models. arXiv preprint arXiv:2305.16582, 2023.
- Chain of thought imitation with procedure cloning. Advances in Neural Information Processing Systems, 35:36366–36381, 2022.
- Thinking like an expert: Multimodal hypergraph-of-thought (hot) reasoning to boost foundation modals. arXiv preprint arXiv:2308.06207, 2023.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
- Nature language reasoning, a survey. arXiv preprint arXiv:2303.14725, 2023.
- Unveiling transformers with lego: a synthetic reasoning task. arXiv preprint arXiv:2206.04301, 2022.
- Cumulative reasoning with large language models. arXiv preprint arXiv:2308.04371, 2023.