Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Using Shapley interactions to understand how models use structure (2403.13106v2)

Published 19 Mar 2024 in cs.LG, cs.AI, cs.CL, and cs.CV

Abstract: Language is an intricately structured system, and a key goal of NLP interpretability is to provide methodological insights for understanding how LLMs represent this structure internally. In this paper, we use Shapley Taylor interaction indices (STII) in order to examine how language and speech models internally relate and structure their inputs. Pairwise Shapley interactions measure how much two inputs work together to influence model outputs beyond if we linearly added their independent influences, providing a view into how models encode structural interactions between inputs. We relate the interaction patterns in models to three underlying linguistic structures: syntactic structure, non-compositional semantics, and phonetic coarticulation. We find that autoregressive text models encode interactions that correlate with the syntactic proximity of inputs, and that both autoregressive and masked models encode nonlinear interactions in idiomatic phrases with non-compositional semantics. Our speech results show that inputs are more entangled for pairs where a neighboring consonant is likely to influence a vowel or approximant, showing that models encode the phonetic interaction needed for extracting discrete phonemic representations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. A new interaction index inspired by the taylor series. CoRR, abs/1902.05622, 2019. URL http://arxiv.org/abs/1902.05622.
  2. Altinok, D. A diverse set of freely available linguistic resources for Turkish. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  13739–13750, Toronto, Canada, July 2023. Association for Computational Linguistics. URL https://aclanthology.org/2023.acl-long.768.
  3. Common voice: A massively-multilingual speech corpus. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pp.  4211–4215, 2020.
  4. wav2vec 2.0: A framework for self-supervised learning of speech representations, 2020.
  5. Syntax-BERT: Improving pre-trained transformers with syntax trees. In Merlo, P., Tiedemann, J., and Tsarfaty, R. (eds.), Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp.  3011–3020, Online, April 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.eacl-main.262. URL https://aclanthology.org/2021.eacl-main.262.
  6. Belinkov, Y. Probing classifiers: Promises, shortcomings, and advances, 2021.
  7. Nosta-d named entity annotation for german: Guidelines and dataset. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pp.  2524–2531, Reykjavik, Iceland, may 2014. European Language Resources Association (ELRA). URL http://www.lrec-conf.org/proceedings/lrec2014/pdf/276_Paper.pdf.
  8. Generating hierarchical explanations on text classification via feature interaction detection, 2020.
  9. Analyzing analytical methods: The case of phonology in neural models of spoken language. In Jurafsky, D., Chai, J., Schluter, N., and Tetreault, J. (eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  4146–4156, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.381. URL https://aclanthology.org/2020.acl-main.381.
  10. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018. URL http://arxiv.org/abs/1810.04805.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929, 2020. URL https://arxiv.org/abs/2010.11929.
  12. Shap-iq: Unified approximation of any-order shapley interactions, 2023.
  13. “an axiomatic approach to the concept of interaction among players in cooperative games”. International Journal of Game Theory, 28:547–565, 11 1999. doi: 10.1007/s001820050125.
  14. Transformer language models without positional encodings still learn positional information, 2022.
  15. A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  4129–4138, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1419. URL https://aclanthology.org/N19-1419.
  16. spacy: Industrial-strength natural language processing in python, 2020.
  17. Feature interactions reveal linguistic structure in language models. In Findings of the Association for Computational Linguistics: ACL 2023, pp.  8697–8712, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.554. URL https://aclanthology.org/2023.findings-acl.554.
  18. Sharp nearby, fuzzy far away: How neural language models use context. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  284–294, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1027. URL https://aclanthology.org/P18-1027.
  19. Krizhevsky, A. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  20. Shapley residuals: Quantifying the limits of the shapley value for explanations. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  26598–26608. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/dfc6aa246e88ab3e32caeaaecf433550-Paper.pdf.
  21. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
  22. Visualizing automatic speech recognition - means for a better understanding? In 2021 ISCA Symposium on Security and Privacy in Speech Communication. ISCA, 2021. doi: 10.21437/spsc.2021-4. URL http://dx.doi.org/10.21437/SPSC.2021-4.
  23. Pointer sentinel mixture models, 2016.
  24. Characterizing intrinsic compositionality in transformers with tree projections, 2022. URL http://arxiv.org/abs/2211.01288.
  25. Owen, G. Multilinear extensions of games. Management Science, 18(5):P64–P79, 1972. ISSN 00251909, 15265501. URL http://www.jstor.org/stable/2661445.
  26. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  27. Rakerd, B. Vowels in consonantal context are perceived more linguistically than are isolated vowels: Evidence from an individual differences scaling study. Perception & psychophysics, 35:123–136, 1984.
  28. LSTMs compose—and Learn—Bottom-up. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp.  2797–2809, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.252. URL https://aclanthology.org/2020.findings-emnlp.252.
  29. Discriminative lexical semantic segmentation with gaps: Running the MWE gamut. Transactions of the Association for Computational Linguistics, 2:193–206, 2014a. doi: 10.1162/tacl_a_00176. URL https://aclanthology.org/Q14-1016.
  30. Comprehensive annotation of multiword expressions in a social web corpus. In Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis, S. (eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation, pp.  455–461. European Language Resources Association (ELRA), May 2014b.
  31. High-low frequency detectors. Distill, 6(1):e00024.005, 2021. ISSN 2476-0757. doi: 10.23915/distill.00024.005. URL https://distill.pub/2020/circuits/frequency-edges.
  32. Shapley, L. S. A Value for N-Person Games. RAND Corporation, Santa Monica, CA, 1952. doi: 10.7249/P0295.
  33. Learning important features through propagating activation differences. CoRR, abs/1704.02685, 2017. URL http://arxiv.org/abs/1704.02685.
  34. Hierarchical interpretations for neural network predictions. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=SkEqro0ctQ.
  35. The shapley taylor interaction index. In International conference on machine learning, pp. 9259–9268. PMLR, 2020.
  36. Sutton, R. The bitter lesson. Incomplete Ideas (blog), 13(1), 2019.
  37. Faith-shap: The faithful shapley interaction index. Journal of Machine Learning Research, 24(94):1–42, 2023.
  38. Neural network acceptability judgments. CoRR, abs/1805.12471, 2018. URL http://arxiv.org/abs/1805.12471.
  39. Speaker identification on the scotus corpus. Journal of the Acoustical Society of America, 123(5):3878, 2008.

Summary

  • The paper introduces Shapley Taylor Interaction Indices to quantify nonlinear feature interactions that uncover the underlying structure of diverse data modalities.
  • It demonstrates distinct behavior across tasks by highlighting syntactic sensitivity in language models, phonetic patterns in speech, and edge-texture differences in image classification.
  • The study offers a framework to enhance model interpretability and transparency, paving the way for multidisciplinary improvements in AI system design.

Shapley Interactions Shed Light on Model Behavior Across Diverse Data Modalities

Introduction

The intricacies of model interpretations in AI, particularly in understanding how models represent and interact with input features, have guided recent advancements toward more nuanced analytical methods. Among these, Shapley Taylor interaction indices (STII) have emerged as a pivotal tool in dissecting the non-linear relationships between features within models. By deploying STII, this investigation explores the underpinnings of data structure impacts on model representations across linguistic constructs, speech patterns, and visual cues.

Shapley Interactions and Their Computation

The roots of STII lie in game theory, specifically in the computation of Shapley values, which traditionally pertain to linear feature attributions. However, given the non-linearity inherent in deep learning models, Shapley values fall short of capturing complex feature interactions. To bridge this gap, STII employs a computation strategy that quantitatively assesses such interactions by considering the combined effect of feature sets on model outputs, opposed to isolating individual contributions.

  • Shapley Taylor Interaction Indices (STII): Utilizes a discrete second-order derivative for approximating the interaction between pairs of feature sets within high-dimensional input spaces.
  • Computation Challenge: The exhaustive calculation requisite for Shapley values, paired with the necessity to iterate over all possible feature subsets, renders the process computationally intensive, particularly for tasks with vast feature spaces like natural language processing or image classification.

Linguistic Structure Analysis

The paper's exploration into linguistic models unveils a nuanced understanding of how Transformer-based Masked LLMs (MLMs) and Autoregressive LLMs (ALMs) navigate syntax and idiomatic expressions. Through meticulous analysis, it surfaces a notable distinction between the two model types in handling syntactic distances and expressions:

  1. Syntax and Distance: A correlation between syntactic proximity and STII was observed, with MLMs demonstrating a heightened sensitivity to syntactic structure in their interaction patterns, contrasting with ALMs' more distance-focused approach.
  2. Idiomatic Expressions: Interaction intensities within idiomatic Multiword Expressions (MWEs) were discovered to be consistently stronger for MLMs, highlighting a divergence in how compositional semantics are tackled by different model architectures.

Speech Model Findings

Investigating speech models provided insights into the phonological underpinnings of speech recognition technologies. The paper identifies a direct relationship between phoneme articulation - the physical shaping of the oral cavity during speech - and the interaction of acoustic features. For example, the transition between consonants and vowels showcases a higher degree of non-linear interaction, aligning with phonetic principles that suggest vowels' acoustic features are heavily influenced by surrounding consonants.

Image Classification Insights

The exploration extends into the field of image classification, where pixel-based feature interactions offer a window into understanding object detection and boundary identification. The paper illustrates how pixels near object edges interact distinctively with those within object foregrounds versus backgrounds, providing empirical evidence for the conceptual divide between edge and texture in the visual perception domain.

Implications and Future Directions

The widespread applicability of STII across various modalities and tasks underscores its utility in uncovering the layered complexities of model representations. Furthermore, the paper propels forward the dialogue on integrating domain expertise into interpretability research, advocating for a multidisciplinary approach to uncovering new avenues for understanding AI systems.

  • Theoretical Implications: Illuminating the nuanced relationship between model representations and underlying data structures, thereby enriching the theoretical foundations of AI interpretability.
  • Practical Applications: Offering a blueprint for enhancing model transparency across sectors, enabling stakeholders to make informed decisions based on a deeper understanding of model behaviors.

Concluding Remarks

This comprehensive investigation into STII across linguistic, auditory, and visual tasks not only enriches the interpretability landscape but also champions a closer collaboration between AI research and domain-specific knowledge bases. As the quest for explainable AI advances, integrating these insights can lead to more transparent, trustworthy, and effective AI systems.