Generalisation First, Memorisation Second? Memorisation Localisation for Natural Language Classification Tasks

Published 9 Aug 2024 in cs.CL | (2408.04965v1)

Abstract: Memorisation is a natural part of learning from real-world data: neural models pick up on atypical input-output combinations and store those training examples in their parameter space. That this happens is well-known, but how and where are questions that remain largely unanswered. Given a multi-layered neural model, where does memorisation occur in the millions of parameters? Related work reports conflicting findings: a dominant hypothesis based on image classification is that lower layers learn generalisable features and that deeper layers specialise and memorise. Work from NLP suggests this does not apply to LLMs, but has been mainly focused on memorisation of facts. We expand the scope of the localisation question to 12 natural language classification tasks and apply 4 memorisation localisation techniques. Our results indicate that memorisation is a gradual process rather than a localised one, establish that memorisation is task-dependent, and give nuance to the generalisation first, memorisation second hypothesis.

Abstract PDF HTML Upgrade to Chat

Citations (2)

View on Semantic Scholar

Summary

The paper reveals that memorisation is a gradual, distributed process across layers, challenging the traditional 'generalisation first, memorisation second' view.
Layer retraining, swapping, gradient forgetting, and probing methods demonstrate that task-specific factors determine the role of each layer in memorisation.
Early layers are critically involved in memorisation for many NLP tasks, suggesting new strategies to mitigate overfitting and address privacy concerns.

Memorisation Localisation in NLP

Overview

The paper "Generalisation First, Memorisation Second? Memorisation Localisation for Natural Language Classification Tasks" by Verna Dankers and Ivan Titov investigates an essential question within the field of neural network research: identifying the layers in a neural model where memorisation occurs, and how this process is task-dependent. This study utilizes 12 natural language classification tasks and applies four distinct memorisation localisation techniques to gain insights.

Motivations and Research Questions

Neural networks often memorise training data, especially atypical input-output combinations. While generalisation properties of networks have been well-studied, the specific mechanisms and layers involved in memorisation remain less well-understood. Previous studies, particularly in computer vision, have suggested a hypothesis that generalisable features are learnt in the lower layers of a network, whereas deeper layers are responsible for memorisation. However, this hypothesis has yielded conflicting results when applied to NLP models. This paper takes a comprehensive approach by extending the scope to a variety of NLP tasks and employs multiple localisation techniques to provide a nuanced perspective on the generalisation-first, memorisation-second hypothesis.

Methodology

The authors use BERT, OPT-125M, Pythia-160M, and GPT-Neo-125M models, each with 12 layers, fine-tuning them on 12 different NLP classification tasks including sentiment analysis, hate speech detection, and topic classification among others. A subset of the labels is perturbed to enforce memorisation, and four memorisation localisation methods are used:

Layer Retraining: Layers of the network are reset and retrained while other layers are kept frozen.
Layer Swapping: Layers from a memorisation model are swapped with the unperturbed model's layers.
Forgetting Gradients: Gradients are computed by back-propagating to forget the perturbations, and the L1-norm per layer is examined.
Probing: Probes are trained to differentiate between perturbed (noisy) and clean examples at each layer of the network.

Results

Gradual Process: Memorisation is a gradual process involving weights across many layers rather than being confined to specific layers.
Task-Dependence: The importance of layers in the memorisation process is task-dependent. For instance, NLU tasks often show more involvement of the lower layers compared to other tasks like topic classification.
Relevance of Early Layers: Across several tasks, early layers were found to be critically involved in the memorisation process, contrary to traditional hypotheses that deeper layers are responsible for specialisation and memorisation.

Implications and Speculations

The findings challenge the generalisation-first, memorisation-second hypothesis, especially for NLP models. The study indicates that early intervention in the model’s layers can effectively mitigate memorisation, which has significant implications:

Model Editing: Strategies aimed at editing the memory of models might need to be redesigned to consider that memorisation is distributed rather than localised.
Privacy Concerns: Early-layer intervention might be an effective strategy to address privacy concerns related to the memorisation of sensitive information.
Overfitting and Generalisation: These insights can help develop methods to reduce overfitting by focusing on early layers during the training or fine-tuning stages.

Future Directions

Future research can explore the properties of tasks that direct memorisation to specific layers. Examining if these findings hold for models with more than 12 layers or different architectures will also be paramount. Additionally, exploring techniques to visualize and interpret memorisation in a more detailed manner could further illuminate the underlying processes at play.

Conclusion

This research provides a crucial understanding of the memorisation processes in NLP models, refuting some long-held beliefs and proposing a nuanced view that acknowledges the gradual and distributed nature of memorisation. The use of multiple tasks and memorisation localisation techniques strengthens the findings, offering new directions for managing memorisation in neural networks.

Markdown