The paper "Language-Specific Neurons: The Key to Multilingual Capabilities in LLMs" (Tang et al., 26 Feb 2024 ) addresses the challenge of understanding how LLMs process multilingual texts without explicit multilingual parallel corpora pre-training. The research introduces a methodology to identify language-specific neurons within Transformer architectures, offering insights into the compositional mechanisms that underpin multilingual capabilities.
LAPE Methodology for Identifying Language-Specific Neurons
The cornerstone of the paper is the Language Activation Probability Entropy (LAPE) method. This technique is designed to pinpoint neurons that exhibit a strong preference for activation by specific languages. The LAPE calculation involves feeding multilingual corpora into the LLM and observing the activation probabilities of individual neurons. For each neuron, the LAPE score is computed to quantify its language activation reaction. A low LAPE score indicates that a neuron is language-specific, exhibiting a pronounced preference for activation in response to one or a small set of languages. Mathematically, the LAPE score can be expressed as:
Where represents the LAPE score for the -th neuron, is the total number of languages, and is the activation probability of the -th neuron in response to the -th language. Neurons demonstrating minimal entropy in their activation probabilities across languages are flagged as language-specific.
Experimental Design and Evaluation Metrics
The paper employed LLaMA-2 (7B, 13B, and 70B) and BLOOM (7.1B) to evaluate the impact of language-specific neurons on multilingual capabilities. The evaluation encompassed two primary tasks: LLMing and open-ended generation. LLMing performance was assessed using perplexity (PPL) scores on Wikipedia corpora. The open-ended generation task utilized a translated version of the Vicuna dataset, with GPT-4 serving as the judge to evaluate the quality of the generated text. Ablation studies were conducted, involving the deactivation of identified language-specific neurons, and the resulting performance degradation was measured. Alternative identification methods, including Language Activation Value Entropy (LAVE), Parameter Variation (PV) based on monolingual instruction tuning, and Random Selection (RS), were used for comparison.
Key Findings: Location and Impact of Language-Specific Neurons
The experimental results indicated that a small proportion of neurons exert a disproportionately large influence on an LLM's ability to process a specific language. Deactivating these neurons resulted in a significant decline in both understanding and generation capabilities for the targeted language. The analysis revealed that language-specific neurons are predominantly located in the bottom and top layers of LLMs. The bottom layers are responsible for processing inputs from different languages into a unified semantic space, while the top layers project the semantic content into the corresponding vocabulary of each language. Specifically, the impact of deactivating language-specific neurons was quantified through perplexity increases and GPT-4 based quality scores, demonstrating a tangible reduction in language processing proficiency.
Steering LLM Outputs Through Neuron Manipulation
The paper demonstrated the potential for controlling the output language of LLMs by selectively activating or deactivating language-specific neurons. This was achieved by manually activating language-specific neurons, increasing their activation value to the average for that language, which increased the likelihood that the model would respond in the language of the prompt. Additionally, cross-lingual generation was achieved by deactivating neurons associated with the source language and activating neurons associated with the target language, resulting in responses generated in the desired target language, even when prompted in a different language.
Language Dominance and Resource Allocation
The analysis uncovered a dominance relationship between high-resource languages (e.g., English in LLaMA-2) and low-resource languages. This suggests that low-resource languages are aligned with high-resource languages within the model's representation space, which has implications for transfer learning and cross-lingual adaptation strategies. This dominance was observed through the degree of overlap in language-specific neuron activation patterns, with high-resource languages exhibiting more distinct and robust activation profiles.
In summary, the paper provides a detailed examination of language-specific neurons in LLMs, offering insights into their location, impact, and potential for manipulating multilingual outputs. The LAPE method and the experimental findings contribute to a deeper understanding of how LLMs achieve multilingual capabilities.