Instruction-tuning Aligns LLMs to the Human Brain
The paper by Khai Loong Aw et al. presents an investigation into the effects of instruction-tuning on the representational similarity between LLMs and the human brain language system. The primary focus is to determine whether instruction-tuning, a prevalent fine-tuning approach for LLMs, enhances their alignment with human neural activity and behavior.
The authors evaluate LLM-human similarity on two fronts: brain alignment and behavioral alignment. Brain alignment is measured as the correspondence between LLM internal representations and neural activity within the human language system, whereas behavioral alignment assesses the parallelism between LLM outputs and human behavior on a reading task.
Several key findings are documented in the paper:
- Improvement in Brain Alignment: Instruction-tuning enhances brain alignment by an average of 6.2% across evaluated datasets, as measured by the Brain-Score metric. This suggests that instruction-tuning mechanisms that encapsulate world knowledge also boost the representational alignment to human brain activity. Noteworthily, the paper finds a high correlation between brain alignment and both model size (r = 0.95) and task performance necessitating world knowledge (r = 0.81).
- Model Properties and Alignment: The research highlights that world knowledge and model size are strongly correlated with brain alignment. The authors analyze correlations using performance scores from two benchmarks: the Massive Multi-task Language Understanding (MMLU) for world knowledge, and the Big-Bench Hard (BBH) for problem-solving abilities. The correlation results indicate that possessing expansive world knowledge is significant for aligning LLMs with human brain activity.
- Contrasting Behavioral Alignment: Interestingly, instruction-tuning shows negligible enhancement in behavioral alignment, as assessed by comparing LLM perplexity with human reading times. This indicates that while instruction-tuning aligns the internal model representations more closely to brain activity, it does not translate to a comparable alignment in behavioral measures.
The implications of this research are manifold for both NLP and neuroscience. From an NLP perspective, the improvement in brain alignment through instruction-tuning suggests an approach for developing LLMs that potentially leverage neural alignment evaluations to build models with enhanced performance on complex, knowledge-dependent tasks. For neuroscience, this paper provides insights into how world knowledge might structurally shape neural activity patterns related to language comprehension and representation.
Several limitations and avenues for future work are acknowledged. The paper's computational demands are highlighted due to the extensive evaluation necessary across multiple models and dimensions. Additionally, the exploration of LLM-human alignment was constrained predominantly to language input tasks, suggesting that future work could expand to diverse cognitive tasks to better understand the correspondence between human neural processes and LLM representations.
In conclusion, the paper offers a significant contribution by demonstrating that instruction-tuning enhances the representational alignment of LLMs with human brain activity, primarily through improved world knowledge encoding. This work paves a meaningful path for future research at the intersection of computational LLMs and human neuroscience.