Using Shapley interactions to understand how models use structure (2403.13106v2)
Abstract: Language is an intricately structured system, and a key goal of NLP interpretability is to provide methodological insights for understanding how LLMs represent this structure internally. In this paper, we use Shapley Taylor interaction indices (STII) in order to examine how language and speech models internally relate and structure their inputs. Pairwise Shapley interactions measure how much two inputs work together to influence model outputs beyond if we linearly added their independent influences, providing a view into how models encode structural interactions between inputs. We relate the interaction patterns in models to three underlying linguistic structures: syntactic structure, non-compositional semantics, and phonetic coarticulation. We find that autoregressive text models encode interactions that correlate with the syntactic proximity of inputs, and that both autoregressive and masked models encode nonlinear interactions in idiomatic phrases with non-compositional semantics. Our speech results show that inputs are more entangled for pairs where a neighboring consonant is likely to influence a vowel or approximant, showing that models encode the phonetic interaction needed for extracting discrete phonemic representations.
- A new interaction index inspired by the taylor series. CoRR, abs/1902.05622, 2019. URL http://arxiv.org/abs/1902.05622.
- Altinok, D. A diverse set of freely available linguistic resources for Turkish. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 13739–13750, Toronto, Canada, July 2023. Association for Computational Linguistics. URL https://aclanthology.org/2023.acl-long.768.
- Common voice: A massively-multilingual speech corpus. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pp. 4211–4215, 2020.
- wav2vec 2.0: A framework for self-supervised learning of speech representations, 2020.
- Syntax-BERT: Improving pre-trained transformers with syntax trees. In Merlo, P., Tiedemann, J., and Tsarfaty, R. (eds.), Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 3011–3020, Online, April 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.eacl-main.262. URL https://aclanthology.org/2021.eacl-main.262.
- Belinkov, Y. Probing classifiers: Promises, shortcomings, and advances, 2021.
- Nosta-d named entity annotation for german: Guidelines and dataset. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pp. 2524–2531, Reykjavik, Iceland, may 2014. European Language Resources Association (ELRA). URL http://www.lrec-conf.org/proceedings/lrec2014/pdf/276_Paper.pdf.
- Generating hierarchical explanations on text classification via feature interaction detection, 2020.
- Analyzing analytical methods: The case of phonology in neural models of spoken language. In Jurafsky, D., Chai, J., Schluter, N., and Tetreault, J. (eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4146–4156, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.381. URL https://aclanthology.org/2020.acl-main.381.
- BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018. URL http://arxiv.org/abs/1810.04805.
- An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929, 2020. URL https://arxiv.org/abs/2010.11929.
- Shap-iq: Unified approximation of any-order shapley interactions, 2023.
- “an axiomatic approach to the concept of interaction among players in cooperative games”. International Journal of Game Theory, 28:547–565, 11 1999. doi: 10.1007/s001820050125.
- Transformer language models without positional encodings still learn positional information, 2022.
- A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4129–4138, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1419. URL https://aclanthology.org/N19-1419.
- spacy: Industrial-strength natural language processing in python, 2020.
- Feature interactions reveal linguistic structure in language models. In Findings of the Association for Computational Linguistics: ACL 2023, pp. 8697–8712, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.554. URL https://aclanthology.org/2023.findings-acl.554.
- Sharp nearby, fuzzy far away: How neural language models use context. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 284–294, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1027. URL https://aclanthology.org/P18-1027.
- Krizhevsky, A. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
- Shapley residuals: Quantifying the limits of the shapley value for explanations. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 26598–26608. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/dfc6aa246e88ab3e32caeaaecf433550-Paper.pdf.
- Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
- Visualizing automatic speech recognition - means for a better understanding? In 2021 ISCA Symposium on Security and Privacy in Speech Communication. ISCA, 2021. doi: 10.21437/spsc.2021-4. URL http://dx.doi.org/10.21437/SPSC.2021-4.
- Pointer sentinel mixture models, 2016.
- Characterizing intrinsic compositionality in transformers with tree projections, 2022. URL http://arxiv.org/abs/2211.01288.
- Owen, G. Multilinear extensions of games. Management Science, 18(5):P64–P79, 1972. ISSN 00251909, 15265501. URL http://www.jstor.org/stable/2661445.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Rakerd, B. Vowels in consonantal context are perceived more linguistically than are isolated vowels: Evidence from an individual differences scaling study. Perception & psychophysics, 35:123–136, 1984.
- LSTMs compose—and Learn—Bottom-up. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2797–2809, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.252. URL https://aclanthology.org/2020.findings-emnlp.252.
- Discriminative lexical semantic segmentation with gaps: Running the MWE gamut. Transactions of the Association for Computational Linguistics, 2:193–206, 2014a. doi: 10.1162/tacl_a_00176. URL https://aclanthology.org/Q14-1016.
- Comprehensive annotation of multiword expressions in a social web corpus. In Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis, S. (eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation, pp. 455–461. European Language Resources Association (ELRA), May 2014b.
- High-low frequency detectors. Distill, 6(1):e00024.005, 2021. ISSN 2476-0757. doi: 10.23915/distill.00024.005. URL https://distill.pub/2020/circuits/frequency-edges.
- Shapley, L. S. A Value for N-Person Games. RAND Corporation, Santa Monica, CA, 1952. doi: 10.7249/P0295.
- Learning important features through propagating activation differences. CoRR, abs/1704.02685, 2017. URL http://arxiv.org/abs/1704.02685.
- Hierarchical interpretations for neural network predictions. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=SkEqro0ctQ.
- The shapley taylor interaction index. In International conference on machine learning, pp. 9259–9268. PMLR, 2020.
- Sutton, R. The bitter lesson. Incomplete Ideas (blog), 13(1), 2019.
- Faith-shap: The faithful shapley interaction index. Journal of Machine Learning Research, 24(94):1–42, 2023.
- Neural network acceptability judgments. CoRR, abs/1805.12471, 2018. URL http://arxiv.org/abs/1805.12471.
- Speaker identification on the scotus corpus. Journal of the Acoustical Society of America, 123(5):3878, 2008.