What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Abstract: What makes a difference in the post-training of LLMs? We investigate the training patterns of different layers in LLMs through the lens of the gradient. We are specifically interested in how fast vs. slow thinking affects the layer-wise gradients, given the recent popularity of training LLMs on reasoning paths such as chain-of-thoughts (CoT) and process rewards. In our study, fast thinking without CoT leads to larger gradients and larger differences of gradients across layers than slow thinking (Detailed CoT), indicating the learning stability brought by the latter. Additionally, we study whether the gradient patterns can reflect the correctness of responses when training different LLMs using slow vs. fast thinking paths. The results show that the gradients of slow thinking can distinguish correct and irrelevant reasoning paths. As a comparison, we conduct similar gradient analyses on non-reasoning knowledge learning tasks, on which, however, trivially increasing the response length does not lead to similar behaviors of slow thinking. Our study strengthens fundamental understandings of LLM training and sheds novel insights on its efficiency and stability, which pave the way towards building a generalizable System-2 agent. Our code, data, and gradient statistics can be found in: https://github.com/MingLiiii/Layer_Gradient.
- Explanations for CommonsenseQA: New Dataset and Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3050–3065, Online. Association for Computational Linguistics.
- Guillaume Alain and Yoshua Bengio. 2017. Understanding intermediate layers using linear classifier probes.
- LoRA learns less and forgets less. Transactions on Machine Learning Research. Featured Certification.
- Stealing part of a production language model. arXiv preprint arXiv:2403.06634.
- Streamlining redundant layers to compress large language models. Preprint, arXiv:2403.19135.
- Training verifiers to solve math word problems. Preprint, arXiv:2110.14168.
- The llama 3 herd of models. Preprint, arXiv:2407.21783.
- Not all layers of llms are necessary during inference. Preprint, arXiv:2403.02181.
- Higher layers need more lora experts. Preprint, arXiv:2402.08562.
- Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361.
- A closer look at the limitations of instruction tuning. Preprint, arXiv:2402.05119.
- Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings.
- Finding neurons in a haystack: Case studies with sparse probing. Transactions on Machine Learning Research.
- Measuring massive multitask language understanding. In International Conference on Learning Representations.
- A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. Preprint, arXiv:2311.05232.
- Trustllm: Trustworthiness in large language models. Preprint, arXiv:2401.05561.
- Exploring concept depth: How large language models acquire knowledge at different layers? Preprint, arXiv:2404.07066.
- How large language models encode context knowledge? a layer-wise probing study. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 8235–8246, Torino, Italia. ELRA and ICCL.
- Post hoc explanations of language models can improve language models. Advances in Neural Information Processing Systems, 36.
- Can LLMs speak for diverse people? tuning LLMs via debate to generate controllable controversial statements. In Findings of the Association for Computational Linguistics ACL 2024, pages 16160–16176, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
- Selective reflection-tuning: Student-selected data recycling for LLM instruction-tuning. In Findings of the Association for Computational Linguistics ACL 2024, pages 16189–16211, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
- Reflection-tuning: Recycling data for better instruction-tuning. In NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following.
- Superfiltering: Weak-to-strong data filtering for fast instruction-tuning. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14255–14273, Bangkok, Thailand. Association for Computational Linguistics.
- From quantity to quality: Boosting LLM performance with self-guided data selection for instruction tuning. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7595–7628, Mexico City, Mexico. Association for Computational Linguistics.
- Safety layers in aligned large language models: The key to llm security. Preprint, arXiv:2408.17003.
- Program induction by rationale generation: Learning to solve and explain algebraic word problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 158–167, Vancouver, Canada. Association for Computational Linguistics.
- The flan collection: Designing data and methods for effective instruction tuning. Preprint, arXiv:2301.13688.
- Shortgpt: Layers in large language models are more redundant than you expect. Preprint, arXiv:2403.03853.
- Cross-task generalization via natural language crowdsourcing instructions. arXiv preprint arXiv:2104.08773.
- Orca 2: Teaching small language models how to reason. Preprint, arXiv:2311.11045.
- Creak: A dataset for commonsense reasoning over entity knowledge. Preprint, arXiv:2109.01653.
- Interpretable machine learning: Fundamental principles and 10 grand challenges. Statistic Surveys, 16:1–85.
- Rethinking interpretability in the era of large language models. arXiv preprint arXiv:2402.01761.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Gemma 2: Improving open language models at a practical size. Preprint, arXiv:2408.00118.
- Llama 2: Open foundation and fine-tuned chat models. Preprint, arXiv:2307.09288.
- Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Label words are anchors: An information flow perspective for understanding in-context learning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9840–9855, Singapore. Association for Computational Linguistics.
- Self-instruct: Aligning language models with self-generated instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13484–13508, Toronto, Canada. Association for Computational Linguistics.
- Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
- Chain-of-thought prompting elicits reasoning in large language models. Preprint, arXiv:2201.11903.
- Interpretability at scale: Identifying causal mechanisms in alpaca. Advances in Neural Information Processing Systems, 36.
- Wizardlm: Empowering large language models to follow complex instructions. Preprint, arXiv:2304.12244.
- A survey on knowledge distillation of large language models. ArXiv, abs/2402.13116.
- Qwen2 technical report. Preprint, arXiv:2407.10671.
- Physics of language models: Part 2.1, grade-school math and the hidden reasoning process. Preprint, arXiv:2407.20311.
- Instruction tuning for large language models: A survey. Preprint, arXiv:2308.10792.
- Explainability for large language models: A survey. Preprint, arXiv:2309.01029.
- A survey of large language models. Preprint, arXiv:2303.18223.
- Lima: Less is more for alignment. Preprint, arXiv:2305.11206.
- Representation engineering: A top-down approach to ai transparency. Preprint, arXiv:2310.01405.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.