Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers? (2404.07066v5)
Abstract: LLMs have shown remarkable performances across a wide range of tasks. However, the mechanisms by which these models encode tasks of varying complexities remain poorly understood. In this paper, we explore the hypothesis that LLMs process concepts of varying complexities in different layers, introducing the idea of "Concept Depth" to suggest that more complex concepts are typically acquired in deeper layers. Specifically, we categorize concepts based on their level of abstraction, defining them in the order of increasing complexity within factual, emotional, and inferential tasks. We conduct extensive probing experiments using layer-wise representations across various LLM families (Gemma, LLaMA, Qwen) on various datasets spanning the three domains of tasks. Our findings reveal that models could efficiently conduct probing for simpler tasks in shallow layers, and more complex tasks typically necessitate deeper layers for accurate understanding. Additionally, we examine how external factors, such as adding noise to the input and quantizing the model weights, might affect layer-wise representations. Our findings suggest that these factors can impede the development of a conceptual understanding of LLMs until deeper layers are explored. We hope that our proposed concept and experimental insights will enhance the understanding of the mechanisms underlying LLMs. Our codes are available at \url{https://github.com/Luckfort/CD}.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- G. Alain and Y. Bengio. Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644, 2016.
- What some concepts might not be. Cognition, 13(3):263–308, 1983.
- A. Azaria and T. Mitchell. The internal state of an llm knows when its lying. arXiv preprint arXiv:2304.13734, 2023.
- Qwen technical report, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Explore, establish, exploit: Red teaming language models from scratch, 2023.
- Discovering and explaining the representation bottleneck of dnns. arXiv preprint arXiv:2111.06236, 2021.
- Do llms know about hallucination? an empirical investigation of llm’s hidden states. arXiv preprint arXiv:2402.09733, 2024.
- Not all layers of llms are necessary during inference. arXiv preprint arXiv:2403.02181, 2024.
- Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies. Transactions of the Association for Computational Linguistics, 9:346–361, 04 2021. ISSN 2307-387X. 10.1162/tacl_a_00370. URL https://doi.org/10.1162/tacl_a_00370.
- Dissecting recall of factual associations in auto-regressive language models. arXiv preprint arXiv:2304.14767, 2023.
- The unreasonable ineffectiveness of the deeper layers. arXiv preprint arXiv:2403.17887, 2024.
- W. Gurnee and M. Tegmark. Language models represent space and time. arXiv preprint arXiv:2310.02207, 2023.
- Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015.
- How large language models encode context knowledge? a layer-wise probing study. arXiv preprint arXiv:2402.16061, 2024.
- Y. Kim. Convolutional neural networks for sentence classification. In A. Moschitti, B. Pang, and W. Daelemans, editors, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751, Doha, Qatar, Oct. 2014. Association for Computational Linguistics. 10.3115/v1/D14-1181. URL https://aclanthology.org/D14-1181.
- Optimal brain damage. Advances in neural information processing systems, 2, 1989.
- Inference-time intervention: Eliciting truthful answers from a language model. Advances in Neural Information Processing Systems, 36, 2024.
- Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P11-1015.
- TuEval at SemEval-2019 task 5: LSTM approach to hate speech detection in English and Spanish. In J. May, E. Shutova, A. Herbelot, X. Zhu, M. Apidianaki, and S. M. Mohammad, editors, Proceedings of the 13th International Workshop on Semantic Evaluation, pages 498–502, Minneapolis, Minnesota, USA, June 2019. Association for Computational Linguistics. 10.18653/v1/S19-2089. URL https://aclanthology.org/S19-2089.
- S. Marks and M. Tegmark. The geometry of truth: Emergent linear structure in large language model representations of true/false datasets. arXiv preprint arXiv:2310.06824, 2023.
- Shortgpt: Layers in large language models are more redundant than you expect. arXiv preprint arXiv:2403.03853, 2024.
- Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372, 2022.
- R. Misra and P. Arora. Sarcasm detection using news headlines dataset. AI Open, 4:13–18, 2023. ISSN 2666-6510. https://doi.org/10.1016/j.aiopen.2023.01.001. URL https://www.sciencedirect.com/science/article/pii/S2666651023000013.
- Future lens: Anticipating subsequent tokens from a single hidden state. arXiv preprint arXiv:2311.04897, 2023.
- T. Räz. Methods for identifying emergent concepts in deep neural networks. Patterns, 4(6), 2023.
- Defining and quantifying the emergence of sparse concepts in dnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20280–20289, 2023.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
- Gemma: Open models based on gemini research and technology, 2024.
- Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022a.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022b.
- On concept-based explanations in deep neural networks. 2019.
- Opening the black box of large language models: Two views on holistic interpretability. arXiv preprint arXiv:2402.10688, 2024.
- Mingyu Jin (38 papers)
- Qinkai Yu (10 papers)
- Jingyuan Huang (9 papers)
- Qingcheng Zeng (30 papers)
- Zhenting Wang (41 papers)
- Wenyue Hua (51 papers)
- Haiyan Zhao (42 papers)
- Kai Mei (30 papers)
- Yanda Meng (18 papers)
- Kaize Ding (59 papers)
- Fan Yang (878 papers)
- Mengnan Du (90 papers)
- Yongfeng Zhang (163 papers)