Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure (2310.05452v2)
Abstract: The pre-trained LLMs have shown their extraordinary capacity to solve reasoning tasks, even on tasks that require a complex process involving multiple sub-steps. However, given the vast possible generation space of all the tasks, how the pretrained model learns the reasoning ability remains an open question. We firstly propose that an intrinsic structural constraint on the generated sequence of language-based reasoning -- we called it template-content structure (T-C structure) -- is the key to explain why LLMs can solve a large number of complex reasoning problems with limited training data by showing this structure can reduce the possible space from exponential level to linear level. Furthermore, by generalizing this structure to the hierarchical case, we demonstrate that models can achieve task composition, further reducing the space needed to learn from linear to logarithmic, thereby effectively learning on complex reasoning involving multiple steps. We provide both examples and formal theory of our T-C structure. We also experimentally validate the existence of the T-C structure in some current LLMs and its effectiveness for reasoning.
- Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 7319–7328, Online, August 2021. Association for Computational Linguistics.
- What learning algorithm is in-context learning? investigations with linear models. In The Eleventh International Conference on Learning Representations, 2023.
- A theory for emergence of complex skills in language models, 2023.
- A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity, 2023.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, pp. 610–623, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383097.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Sparks of artificial general intelligence: Early experiments with gpt-4, 2023.
- Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 4:357–370, 2016.
- Noam Chomsky. Language and problems of knowledge: The Managua lectures, volume 16. MIT press, 1987.
- Training verifiers to solve math word problems. CoRR, abs/2110.14168, 2021.
- Faith and fate: Limits of transformers on compositionality, 2023.
- Towards revealing the mystery behind chain of thought: A theoretical perspective, 2023.
- The importance of generation order in language modeling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2942–2946, 2018.
- Google. Palm: Scaling language modeling with pathways, 2022.
- Michael Hahn. Theoretical limitations of self-attention in neural sequence models. Transactions of the Association for Computational Linguistics, 8:156–171, 2020.
- A brief survey on the approximation theory for sequence modelling. Journal of Machine Learning, 2(1), 2023.
- Towards a mechanistic interpretation of multi-step reasoning capabilities of language models, 2023.
- Case-based or rule-based: How do transformers do the math?, 2024.
- Polynomial bounds for vc dimension of sigmoidal and general pfaffian neural networks. Journal of Computer and System Sciences, 54(1):169–176, 1997. ISSN 0022-0000.
- Parsing algebraic word problems into equations. Transactions of the Association for Computational Linguistics, 3:585–597, 2015.
- Measuring the intrinsic dimension of objective landscapes. In International Conference on Learning Representations, 2018.
- A unified MRC framework for named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5849–5859, Online, July 2020. Association for Computational Linguistics.
- Evaluating the logical reasoning ability of chatgpt and gpt-4, 2023.
- Your transformer may not be as powerful as you expect. In Advances in Neural Information Processing Systems, 2022.
- Text and patterns: For effective chain of thought, it takes two to tango, 2022.
- The expressive power of transformers with chain of thought, 2023.
- GenAI Meta. Llama 2: Open foundation and fine-tuned chat models, 2023.
- Rethinking the role of demonstrations: What makes in-context learning work? In EMNLP, 2022.
- OpenAI. Gpt-4 technical report, 2023.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Attention is turing-complete. Journal of Machine Learning Research, 22(75):1–35, 2021.
- Deep contextualized word representations. In Marilyn Walker, Heng Ji, and Amanda Stent (eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237, New Orleans, Louisiana, June 2018. Association for Computational Linguistics.
- Is chatgpt a general-purpose natural language processing task solver?, 2023.
- Improving language understanding by generative pre-training. 2018.
- How capable can a transformer become? a study on synthetic, interpretable tasks. In R0-FoMo:Robustness of Few-shot and Zero-shot Learning in Large Foundation Models, 2023. URL https://openreview.net/forum?id=KIhFggzePM.
- Evaluation of chatgpt as a question answering system for answering complex questions, 2023.
- Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5797–5808, 2019.
- Understanding the reasoning ability of language models from the perspective of reasoning paths aggregation, 2024.
- EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 6382–6388, Hong Kong, China, November 2019. Association for Computational Linguistics.
- Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, 2022.
- An explanation of in-context learning as implicit bayesian inference. In International Conference on Learning Representations, 2022.
- An empirical study of gpt-3 for few-shot knowledge-based vqa. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3):3081–3089, Jun. 2022. doi: 10.1609/aaai.v36i3.20215.
- Skill-mix: a flexible and expandable family of evaluations for ai models, 2023.
- Are transformers universal approximators of sequence-to-sequence functions? In International Conference on Learning Representations, 2020.
- Causal parrots: Large language models may talk causality but are not causal. Transactions on Machine Learning Research, 2023. ISSN 2835-8856.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
- Haotong Yang (11 papers)
- Fanxu Meng (26 papers)
- Zhouchen Lin (158 papers)
- Muhan Zhang (89 papers)