Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers (2312.04333v4)
Abstract: This paper presents an in-depth analysis of LLMs, focusing on LLaMA, a prominent open-source foundational model in natural language processing. Instead of assessing LLaMA through its generative output, we design multiple-choice tasks to probe its intrinsic understanding in high-order tasks such as reasoning and computation. We examine the model horizontally, comparing different sizes, and vertically, assessing different layers. We unveil several key and uncommon findings based on the designed probing tasks: (1) Horizontally, enlarging model sizes almost could not automatically impart additional knowledge or computational prowess. Instead, it can enhance reasoning abilities, especially in math problem solving, and helps reduce hallucinations, but only beyond certain size thresholds; (2) In vertical analysis, the lower layers of LLaMA lack substantial arithmetic and factual knowledge, showcasing logical thinking, multilingual and recognitive abilities, with top layers housing most computational power and real-world knowledge.
- Language models are few-shot learners. CoRR, abs/2005.14165.
- Orca: A few-shot benchmark for chinese conversational machine reading comprehension. arXiv preprint arXiv:2302.13619.
- Bridging the gap between language models and cross-lingual sequence labeling. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1909–1923, Seattle, United States. Association for Computational Linguistics.
- Alleviating over-smoothing for unsupervised sentence representation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3552–3566, Toronto, Canada. Association for Computational Linguistics.
- What would harry say? building dialogue agents for characters in a story. arXiv preprint arXiv:2211.06869.
- Breaking language barriers in multilingual mathematical reasoning: Insights and observations. arXiv preprint arXiv:2310.20246.
- Wenhu Chen. 2023. Large language models are few(1)-shot table reasoners. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1120–1130, Dubrovnik, Croatia. Association for Computational Linguistics.
- Dola: Decoding by contrasting layers improves factuality in large language models. arXiv preprint arXiv:2309.03883.
- Training verifiers to solve math word problems. CoRR, abs/2110.14168.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Pal: Program-aided language models. In International Conference on Machine Learning, pages 10764–10799. PMLR.
- Yoav Goldberg. 2019. Assessing bert’s syntactic abilities.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
- What does BERT learn about the structure of language? In ACL (1), pages 3651–3657. Association for Computational Linguistics.
- What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3651–3657, Florence, Italy. Association for Computational Linguistics.
- Truthfulqa: Measuring how models mimic human falsehoods. In ACL (1), pages 3214–3252. Association for Computational Linguistics.
- Linguistic knowledge and transferability of contextual representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1073–1094, Minneapolis, Minnesota. Association for Computational Linguistics.
- OpenAI. 2023. Gpt-4 technical report.
- Dissecting contextual word embeddings: Architecture and representation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1499–1509, Brussels, Belgium. Association for Computational Linguistics.
- Language models as knowledge bases? In EMNLP/IJCNLP (1), pages 2463–2473. Association for Computational Linguistics.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- What do you learn from context? probing for sentence structure in contextualized word representations. In International Conference on Learning Representations.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Attention is all you need. In NIPS, pages 5998–6008.
- Emergent abilities of large language models. Trans. Mach. Learn. Res., 2022.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
- Reclor: A reading comprehension dataset requiring logical reasoning. In ICLR. OpenReview.net.
- Nuo Chen (100 papers)
- Ning Wu (63 papers)
- Shining Liang (9 papers)
- Ming Gong (246 papers)
- Linjun Shou (53 papers)
- Dongmei Zhang (193 papers)
- Jia Li (380 papers)