What Makes Two Language Models Think Alike? (2406.12620v2)

Published 18 Jun 2024 in cs.CL

Abstract: Do architectural differences significantly affect the way models represent and process language? We propose a new approach, based on metric-learning encoding models (MLEMs), as a first step to answer this question. The approach provides a feature-based comparison of how any two layers of any two models represent linguistic information. We apply the method to BERT, GPT-2 and Mamba. Unlike previous methods, MLEMs offer a transparent comparison, by identifying the specific linguistic features responsible for similarities and differences. More generally, the method uses formal, symbolic descriptions of a domain, and use these to compare neural representations. As such, the approach can straightforwardly be extended to other domains, such as speech and vision, and to other neural systems, including human brains.

Summary

The paper introduces a novel MLEM framework to explain how architectural differences shape linguistic representations across model layers.
The paper demonstrates that part-of-speech features dominate in BERT and GPT-2, while Mamba uniquely emphasizes word position around layer 10.
The paper reports a high correlation (Tweighted = 0.77) between GPT-2 and BERT and a significant divergence (Tweighted = -0.24) between Mamba and BERT layers.

Insights on Architectural Influences in LLMs via Metric-Learning Encoding Models

The investigation presented in the paper focuses on discerning whether architectural variations significantly influence the manner in which LLMs conceptualize and process linguistic input. The authors introduce a novel methodological framework based on Metric-Learning Encoding Models (MLEMs) to facilitate a feature-centric comparison of linguistic information representation across various layers of different LLMs. This paradigm shift aims to extend beyond mere similarity assessments, venturing into the explanatory domain by isolating specific linguistic attributes that drive observed similarities and divergences between LLMs.

Methodological Framework

The paper utilizes three well-distinct neural architectures: BERT, GPT-2, and Mamba. BERT, an encoder-based Transformer, GPT-2, a decoder-based Transformer, and Mamba, derived from state-space models, were chosen for their fundamental architectural differences. Using the MLEM approach, the authors assess linguistic representation at each model layer by assigning feature importance (FI) values to linguistic features such as tense, grammatical number, gender, etc. This feature importance is calculated based on the reduction of Spearman correlation scores when a specific feature is permuted, allowing the delineation of dominant linguistic features at various model layers.

Key Findings and Numerical Results

A standout result from the analysis is the dominant influence of the part-of-speech (PoS) feature in Transformer models like BERT and GPT-2 across various layers. Conversely, Mamba exhibited deviations, with an accentuated role of word position as features traversed through its layers, notably intensifying around layer 10. Moreover, while investigating feature-based similarity between models, a weighted version of Kendall correlation coefficient furnished quantifications that reinforced the PoS attribute's dominance in high similarity cases, signified by a correlation value of Tweighted = 0.77 between specific layers of GPT-2 and BERT. In contrast, dissimilarity manifested more substantially in variations, notably with a lower Kendall correlation of Tweighted = -0.24 between Mamba and BERT layers.

Implications for AI Development

The introduction of MLEMs and the alignment of feature-space analyses with neural distances propound a methodological innovation that has implications for understanding LLM architectures. The technique bears potential for extrapolation beyond natural language processing tasks, into domains such as vision and speech, and possibly in neural system studies encompassing the comparison of human and artificial neural architectures.

Theoretical and Practical Considerations

Theoretically, the MLEM approach implies that models sharing computational objectives such as next-word prediction might adopt distinct representational strategies depending on architectural design. Practically, such nuanced insights are invaluable for guiding future model design and training regimens tailored to specific linguistic tasks. This methodology enriches the explanatory toolkit available for LLMs by illuminating the precise linguistic features contributing to inter-model synchronicity or deviation.

Future Prospects

Future work could explore the integration of feature interaction terms in the MLEM framework, enhancing the clarity of interaction dynamics amongst linguistic features. Expanding the range of features and datasets is another promising avenue to holistically model linguistic representations. Consequently, these enhancements could further refine our understanding of neural architectures in both artificial and biological systems, potentially bridging gaps between computational modeling and cognitive neuroscience.

In conclusion, the paper's proposed MLEM-based analysis facilitates a robust framework for dissecting representational nuances shaped by architectural differences in LLMs, advancing both theoretical understanding and practical model design in the field of computational linguistics.