Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models (2409.05771v1)

Published 9 Sep 2024 in cs.CL and cs.AI

Abstract: Research has repeatedly demonstrated that intermediate hidden states extracted from LLMs are able to predict measured brain response to natural language stimuli. Yet, very little is known about the representation properties that enable this high prediction performance. Why is it the intermediate layers, and not the output layers, that are most capable for this unique and highly general transfer task? In this work, we show that evidence from language encoding models in fMRI supports the existence of a two-phase abstraction process within LLMs. We use manifold learning methods to show that this abstraction process naturally arises over the course of training a LLM and that the first "composition" phase of this abstraction process is compressed into fewer layers as training continues. Finally, we demonstrate a strong correspondence between layerwise encoding performance and the intrinsic dimensionality of representations from LLMs. We give initial evidence that this correspondence primarily derives from the inherent compositionality of LLMs and not their next-word prediction properties.

Summary

The paper demonstrates that intermediate LLM layers achieve peak brain encoding performance through a two-phase abstraction process validated by fMRI data.
It uses manifold learning and linear mapping techniques to quantify intrinsic dimensionality and surprisal across various LLM sizes.
Findings indicate that compositional abstraction, not just next-token prediction, drives brain-LLM similarity, guiding future hybrid encoding models.

Evidence from fMRI Supports a Two-Phase Abstraction Process in LLMs

The paper "Evidence from fMRI Supports a Two-Phase Abstraction Process in LLMs" contributes to our understanding of the processes within LLMs by positing a two-phase abstraction process, supported by empirical evidence from functional Magnetic Resonance Imaging (fMRI) studies. Authored by Cheng and Antonello, the paper examines why intermediate layers in LLMs display a stronger ability to predict brain response to language stimuli compared to the final output layers. This research is pivotal in elucidating the correlation between representational properties of LLMs and human brain activity during language comprehension.

Core Hypothesis and Research Questions

The primary hypothesis explored is that the high predictive performance of intermediate LLM layers in modeling brain activity is driven by their abstractive, compositional properties rather than by their next-token prediction capabilities. This hypothesis is segmented into three main observables:

Brain-model representational similarity measured through a linear mapping from LLM representations to fMRI data.
Intrinsic dimensionality of representations across LLM layers to capture feature complexity.
Layerwise next-token prediction error to examine the alternate hypothesis that prediction capabilities drive brain-LLM similarity.

Methodology

Brain-Model Similarity

fMRI data from three human subjects were analyzed as they listened to 20 hours of English language podcasts. The paper utilized the method described by Goldstein et al. (2022) to train encoding models and computed linear projections from LLM activations to brain activity. The models used included OPT (125M, 1.3B, 13B) and deduped Pythia (6.9B), with multiple training checkpoints to observe model behavior over training.

Dimensionality of Neural Manifolds

Intrinsic dimensionality ( $I_d$ ) and linear effective dimensionality ( $d$ ) of LLM representations were calculated using manifold learning techniques. The primary tool was the Generalized Ratios Intrinsic Dimension Estimator (GRIDE). For robustness, PCA with a variance cutoff of 0.99 and Participation Ratio (PR) were also employed.

Layerwise Surprisal

Next-token prediction error was assessed using the TunedLens approach, which learns an affine mapping from intermediate layers to the vocabulary space. Surprisal scores were computed on The Pile dataset to determine the predictive coding objective's contribution to representational similarity.

Findings

The authors present strong evidence that representation complexity in intermediate layers, as captured by intrinsic dimensionality, correlates highly with brain encoding performance. This was robustly established across different dimensionality metrics and LLM sizes. A notable finding was a two-phase abstraction process within LLMs. The paper identifies an “abstract-predict” phase transition at specific layers (e.g., layer 17 for OPT-1.3B), where the peak in layer representational dimensionality coincides with the peak encoding performance. These peaks shift earlier with larger and more extensively trained models.

Furthermore, the results demonstrated that the relationship between representational dimensionality and encoding performance emerges inherently over the course of LLM training. This implies that the compositional abstraction properties driving brain-LLM similarity are reinforced during model training.

Discussion

The implications of these findings are manifold. Firstly, the evidence suggests that the similarity between LLMs and human brains is driven by abstract, compositional features rather than by predictive coding of next-token identity. As LLMs become larger and more trained, the optimal layers for encoding brain activity shift earlier due to the saturation of abstract features. This separation between abstraction and prediction phases provides valuable insights into the inherent structure of LLMs.

Practically, the paper opens new avenues for enhancing encoding models. Combining spectral properties of various LLM layers to generate richer, high-dimensional representations could improve predictive accuracy beyond any single layer. These findings necessitate further exploration across diverse model architectures to confirm generalizability.

Future Directions

The observed phenomena suggest new research trajectories in both theoretical and applied AI. From a theoretical standpoint, investigating whether different architectures exhibit similar phase transitions would deepen the understanding of representational dynamics. Practically, the development of hybrid encoding models utilizing multiple layers could offer superior performance in brain decoding tasks.

In conclusion, Cheng and Antonello provide a compelling analysis of the representational properties driving brain-LLM similarity, emphasizing the complex interplay between abstraction and prediction processes in LLMs. Their findings significantly enhance the understanding of LLM internal processes and their alignment with human brain activity, paving the way for future advancements in encoding and representational learning.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (2)

Tweets

https://twitter.com/NeuroRJ/status/1834627325679247618

https://twitter.com/fly51fly/status/1835154224562553175

https://twitter.com/kromem2dot0/status/1835752784853131485