Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
91 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
o3 Pro
5 tokens/sec
GPT-4.1 Pro
15 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
Gemini 2.5 Flash Deprecated
12 tokens/sec
2000 character limit reached

Opening the Black Box: Analyzing Attention Weights and Hidden States in Pre-trained Language Models for Non-language Tasks (2306.12198v1)

Published 21 Jun 2023 in cs.CL, cs.AI, and cs.LG

Abstract: Investigating deep learning LLMs has always been a significant research area due to the ``black box" nature of most advanced models. With the recent advancements in pre-trained LLMs based on transformers and their increasing integration into daily life, addressing this issue has become more pressing. In order to achieve an explainable AI model, it is essential to comprehend the procedural steps involved and compare them with human thought processes. Thus, in this paper, we use simple, well-understood non-language tasks to explore these models' inner workings. Specifically, we apply a pre-trained LLM to constrained arithmetic problems with hierarchical structure, to analyze their attention weight scores and hidden states. The investigation reveals promising results, with the model addressing hierarchical problems in a moderately structured manner, similar to human problem-solving strategies. Additionally, by inspecting the attention weights layer by layer, we uncover an unconventional finding that layer 10, rather than the model's final layer, is the optimal layer to unfreeze for the least parameter-intensive approach to fine-tune the model. We support these findings with entropy analysis and token embeddings similarity analysis. The attention analysis allows us to hypothesize that the model can generalize to longer sequences in ListOps dataset, a conclusion later confirmed through testing on sequences longer than those in the training set. Lastly, by utilizing a straightforward task in which the model predicts the winner of a Tic Tac Toe game, we identify limitations in attention analysis, particularly its inability to capture 2D patterns.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. “Improving Language Understanding by Generative Pre-training”.
  2. Dong, Linhao, Shuang Xu, and Bo Xu. “Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition.” 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018.
  3. Jawahar, Ganesh, Benoît Sagot, and Djamé Seddah.“What does BERT learn about the structure of language?.” ACL 2019-57th Annual Meeting of the Association for Computational Linguistics. 2019.
  4. Serrano, Sofia, and Noah A. Smith. “Is attention interpretable?.” arXiv preprint arXiv:1906.03731 (2019).
  5. OpenAI. “GPT-4 technical report.” arXiv preprint arXiv: 2303.08774(2023)
  6. Jain, Sarthak, and Byron C. Wallace. “Attention is not explanation.” arXiv preprint arXiv:1902.10186 (2019).
  7. Hoover, Benjamin, Hendrik Strobelt, and Sebastian Gehrmann. “exbert: A visual analysis tool to explore learned representations in transformers models.” arXiv preprint arXiv:1910.05276 (2019).
  8. Vig, Jesse. “BertViz: A tool for visualizing multihead self-attention in the BERT model.” ICLR workshop: Debugging machine learning models. 2019.
  9. Goldberg, Yoav. “Assessing BERT’s syntactic abilities.” arXiv preprint arXiv:1901.05287 (2019).
  10. Jawahar, Ganesh, Benoît Sagot, and Djamé Seddah. “What does BERT learn about the structure of language?.” ACL 2019-57th Annual Meeting of the Association for Computational Linguistics. 2019.
  11. Wiegreffe, Sarah, and Yuval Pinter. “Attention is not not explanation.” arXiv preprint arXiv:1908.04626 (2019).
  12. Rajpurkar, Pranav, Robin Jia, and Percy Liang. “Know what you don’t know: Unanswerable questions for SQuAD.” arXiv preprint arXiv:1806.03822 (2018).
  13. Sun, Chi, Luyao Huang, and Xipeng Qiu. “Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence.” arXiv preprint arXiv:1903.09588 (2019).
  14. Nangia, Nikita, and Samuel R. Bowman. “Listops: A diagnostic dataset for latent tree learning.” arXiv preprint arXiv:1804.06028 (2018).
Citations (2)

Summary

We haven't generated a summary for this paper yet.