Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Generating Informative Textual Description for Neurons in Language Models (2401.16731v1)

Published 30 Jan 2024 in cs.CL and cs.AI

Abstract: Recent developments in transformer-based LLMs have allowed them to capture a wide variety of world knowledge that can be adapted to downstream tasks with limited resources. However, what pieces of information are understood in these models is unclear, and neuron-level contributions in identifying them are largely unknown. Conventional approaches in neuron explainability either depend on a finite set of pre-defined descriptors or require manual annotations for training a secondary model that can then explain the neurons of the primary model. In this paper, we take BERT as an example and we try to remove these constraints and propose a novel and scalable framework that ties textual descriptions to neurons. We leverage the potential of generative LLMs to discover human-interpretable descriptors present in a dataset and use an unsupervised approach to explain neurons with these descriptors. Through various qualitative and quantitative analyses, we demonstrate the effectiveness of this framework in generating useful data-specific descriptors with little human involvement in identifying the neurons that encode these descriptors. In particular, our experiment shows that the proposed approach achieves 75% precision@2, and 50% recall@2

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Andreas Köpf, et. al. 2023. OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5. https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5.
  2. Network Dissection: Quantifying Interpretability of Deep Visual Representations. arXiv:1704.05796.
  3. Bowman, S. R. 2023. Eight Things to Know about Large Language Models. arXiv:2304.00612.
  4. Language Models are Few-Shot Learners.
  5. Scaling Instruction-Finetuned Language Models.
  6. What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models. arXiv:1812.09355.
  7. Discovering Latent Concepts Learned in BERT.
  8. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
  9. Natural Language Descriptions of Deep Visual Features. CoRR, abs/2201.11114.
  10. Visualizing and Understanding Recurrent Networks.
  11. Representation of linguistic form and function in recurrent neural networks.
  12. ”Oops, Did I Just Say That?” Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process. arXiv:2305.02626.
  13. Compositional Explanations of Neurons. arXiv:2006.14032.
  14. Discovery of Natural Language Concepts in Individual Units of CNNs.
  15. WT5?! Training Text-to-Text Models to Explain their Predictions.
  16. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 188–197. Association for Computational Linguistics.
  17. Cluster Labeling by Word Embeddings and WordNet’s Hypernymy. In Proceedings of the Australasian Language Technology Association Workshop 2018, 66–70. Dunedin, New Zealand.
  18. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.
  19. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  20. Neuron-level Interpretation of Deep NLP Models: A Survey. CoRR, abs/2108.13138.
  21. Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP. arXiv:2103.00453.
  22. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps.
  23. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825.
  24. Intrinsic Probing through Dimension Selection. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 197–216. Online: Association for Computational Linguistics.
  25. Attention Is All You Need.
  26. Emergent abilities of large language models. TMLR.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets