Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain (2401.17671v1)

Published 31 Jan 2024 in cs.CL, cs.AI, and q-bio.NC

Abstract: Recent advancements in artificial intelligence have sparked interest in the parallels between LLMs and human neural processing, particularly in language comprehension. While prior research has established similarities in the representation of LLMs and the brain, the underlying computational principles that cause this convergence, especially in the context of evolving LLMs, remain elusive. Here, we examined a diverse selection of high-performance LLMs with similar parameter sizes to investigate the factors contributing to their alignment with the brain's language processing mechanisms. We find that as LLMs achieve higher performance on benchmark tasks, they not only become more brain-like as measured by higher performance when predicting neural responses from LLM embeddings, but also their hierarchical feature extraction pathways map more closely onto the brain's while using fewer layers to do the same encoding. We also compare the feature extraction pathways of the LLMs to each other and identify new ways in which high-performing models have converged toward similar hierarchical processing mechanisms. Finally, we show the importance of contextual information in improving model performance and brain similarity. Our findings reveal the converging aspects of language processing in the brain and LLMs and offer new directions for developing models that align more closely with human cognitive processing.

PDF Abstract

The paper "Contextual Feature Extraction Hierarchies Converge in LLMs and the Brain" investigates the parallels between LLMs and the human brain in terms of language comprehension. The paper builds on prior research that has established certain similarities between LLMs and the brain, aiming to uncover the computational principles that drive these convergences, especially as LLMs evolve. Here are the key insights and findings from the paper:

Performance and Brain-Likeness: The research highlights that as LLMs improve their performance on benchmark tasks, they increasingly resemble brain processes. This resemblance is quantified by the ability of the models to predict neural responses. High-performing LLMs showed better prediction accuracy of neural responses from their embeddings.
Hierarchical Feature Extraction: An intriguing finding is the mapping of hierarchical feature extraction pathways in LLMs onto those in the brain. Specifically, the paper found that despite using fewer layers, LLMs could achieve similar encoding levels to the brain. This suggests that high-performing LLMs have optimized their processes in ways that mimic the brain's language processing mechanisms more closely.
Model Comparisons: The researchers compared a variety of high-performance LLMs with similar parameter sizes. They discovered that as these models advanced in their capabilities, they converged toward similar hierarchical processing mechanisms. This convergence implies that there may be an optimal way to structure language processing hierarchically that both brains and high-performing LLMs are independently discovering.
Importance of Context: The paper underscores the critical role of contextual information in enhancing both the performance of the models and their similarity to brain processes. Effective incorporation of contextual cues appears to be a common factor in the success of both LLMs and human neural processing in language comprehension.
Future Directions: Based on their findings, the authors suggest new avenues for developing LLMs that align more closely with human cognitive processes. This could potentially lead to advancements in creating AI that processes language in a way that is more intuitive and human-like.

In conclusion, the paper provides compelling evidence that as LLMs evolve, their language processing mechanisms increasingly resemble those of the brain. This convergence occurs not only in terms of performance but also in the hierarchical feature extraction pathways, suggesting a fundamental similarity in how both systems handle language. The emphasis on contextual information offers a promising direction for future research and development in creating ever more sophisticated and brain-like LLMs.