Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
117 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Interpretability using Human Similarity Judgements to Prune Word Embeddings (2310.10262v1)

Published 16 Oct 2023 in cs.CL

Abstract: Interpretability methods in NLP aim to provide insights into the semantics underlying specific system architectures. Focusing on word embeddings, we present a supervised-learning method that, for a given domain (e.g., sports, professions), identifies a subset of model features that strongly improve prediction of human similarity judgments. We show this method keeps only 20-40% of the original embeddings, for 8 independent semantic domains, and that it retains different feature sets across domains. We then present two approaches for interpreting the semantics of the retained features. The first obtains the scores of the domain words (co-hyponyms) on the first principal component of the retained embeddings, and extracts terms whose co-occurrence with the co-hyponyms tracks these scores' profile. This analysis reveals that humans differentiate e.g. sports based on how gender-inclusive and international they are. The second approach uses the retained sets as variables in a probing task that predicts values along 65 semantically annotated dimensions for a dataset of 535 words. The features retained for professions are best at predicting cognitive, emotional and social dimensions, whereas features retained for fruits or vegetables best predict the gustation (taste) dimension. We discuss implications for alignment between AI systems and human knowledge.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. On the similarity between hidden layers of pruned and unpruned convolutional neural networks:. In Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods, page 52–59, Valletta, Malta. SCITEPRESS - Science and Technology Publications.
  2. Human inspired progressive alignment and comparative learning for grounded word acquisition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15475–15493, Toronto, Canada. Association for Computational Linguistics.
  3. Toward a brain-based componential semantic representation. Cognitive neuropsychology, 33(3-4):130–174.
  4. Decoding word embeddings with brain-based semantic features. Computational Linguistics, 47:1–36.
  5. Lee R Dice. 1945. Measures of the amount of ecologic association between species. Ecology, 26(3):297–302.
  6. A fine-grained comparison of pragmatic language understanding in humans and language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4194–4213, Toronto, Canada. Association for Computational Linguistics.
  7. Tinybert: Distilling bert for natural language understanding. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4163–4174.
  8. Alexandre Kabbach and Aurélie Herbelot. 2021. Avoiding conflict: When speaker coordination does not require conceptual agreement. Frontiers in Artificial Intelligence, 3.
  9. Philipp Kaniuth and Martin N. Hebart. 2021. Feature-reweighted representational similarity analysis: A method for improving the fit between computational models, brains, and behavior. NeuroImage, 257.
  10. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  11. Training debiased subnetworks with contrastive weight pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7929–7938.
  12. Glove: Global vectors for word representation. In Conference on Empirical Methods in Natural Language Processing.
  13. Evaluating (and improving) the correspondence between deep neural networks and human representations. Cognitive science, 42 8:2648–2669.
  14. A comparative study on the impact of model compression techniques on fairness in language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15762–15782.
  15. Russell Richie and Sudeep Bhatia. 2020. Similarity judgment within and across categories: A comprehensive model comparison. Cognitive science, 45 8:e13030.
  16. Basic objects in natural categories. Cognitive psychology, 8(3):382–439.
  17. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
  18. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  19. Improved prediction of behavioral and neural similarity spaces using pruned dnns. bioRxiv.
  20. Akira Utsumi. 2020. Exploring what is encoded in distributional word vectors: A neurobiologically motivated analysis. Cognitive Science, 44(6):e12844.
  21. Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966.
  22. Canwen Xu and Julian McAuley. 2023. A survey on model compression and acceleration for pretrained language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 10566–10575.
Citations (4)

Summary

We haven't generated a summary for this paper yet.