Measures of Information Reflect Memorization Patterns (2210.09404v4)

Published 17 Oct 2022 in cs.LG, cs.IT, and math.IT

Abstract: Neural networks are known to exploit spurious artifacts (or shortcuts) that co-occur with a target label, exhibiting heuristic memorization. On the other hand, networks have been shown to memorize training examples, resulting in example-level memorization. These kinds of memorization impede generalization of networks beyond their training distributions. Detecting such memorization could be challenging, often requiring researchers to curate tailored test sets. In this work, we hypothesize -- and subsequently show -- that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization. We quantify the diversity in the neural activations through information-theoretic measures and find support for our hypothesis on experiments spanning several natural language and vision tasks. Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabelled in-distribution examples. Lastly, we demonstrate the utility of our findings for the problem of model selection. The associated code and other resources for this work are available at https://rachitbansal.github.io/information-measures.

Citations (7)

View on Semantic Scholar

Summary

The paper introduces an approach using entropy and mutual information to differentiate between heuristic and example-level memorization.
It employs binning and k-nearest neighbor estimators to measure intra- and inter-neuron diversity in neural networks.
Experimental results demonstrate a strong correlation between these intrinsic measures and traditional generalization metrics for effective model selection.

This paper explores how information organization within the internal activations of a neural network can reveal different types of memorization, which in turn affect the network's generalization ability. The authors propose using two information-theoretic measures, entropy and mutual information, to quantify this organization. They hypothesize that these measures can distinguish between models exhibiting heuristic memorization (learning spurious correlations or shortcuts) and those exhibiting example-level memorization (overfitting to individual training examples, possibly with noisy labels).

The core idea is based on the observation that diverse activation patterns across neurons and examples might correspond to learning robust, generalizable features, while less diverse or highly correlated patterns might indicate reliance on specific input features (heuristics) or individual training instances.

To quantify this, the paper defines:

Intra-neuron diversity: The variation in the activation values of a single neuron across a dataset of examples. This is measured using the Shannon entropy of the neuron's activation distribution.

$H(A_{i}) = \sum_{j=1}^{N_{\text{bins}}}p(\hat{a}_i^{j})\log(\frac{1}{p(\hat{a}_i^{j})})$

where $A_i$ is the activation of neuron $i$ across examples, $\hat{a}_i^j$ is the discretized activation value, and $p(\hat{a}_i^j)$ is its probability.
Inter-neuron diversity: The dissimilarity between the activation patterns of different neurons on the same set of examples. This is quantified using the mutual information (MI) between pairs of neuron activations.

$I(A_x; A_y) = \psi(k) + \psi(S) - \frac{1}{S}\sum^{S}_{i=1}(\psi(e_{(x, i)}) + \psi(e_{(y, i)}))$

where $A_x, A_y$ are the activations of two neurons, $S$ is the number of samples, $k$ is the number of nearest neighbors used in the MI estimation, $\psi$ is the digamma function, and $e_{(x,i)}, e_{(y,i)}$ are related to the counts of neighbors within a certain distance.

The computation of these measures for a network involves:

Collecting neuron activations for a set of unlabeled in-distribution examples.
For each neuron, discretizing its activations (e.g., by binning) and computing its entropy (intra-neuron diversity).
For each pair of neurons, estimating the mutual information between their activation distributions (inter-neuron diversity). The paper uses a k-nearest neighbor-based estimator for MI.

The algorithm for computing these measures is summarized as follows:

Algorithm: Compute Information Measures

Input: Network encoder f, set of S unlabeled examples {x_1, ..., x_S}
Output: List of entropies H, List of mutual information values I

1. Compute activations for all neurons N for each example:
   A_1, ..., A_N = {f(x_i)}_{i=1}^S  (Shape: N x S)

2. Initialize H = [], I = []

3. For each neuron i from 1 to N:
   a. Compute Entropy(A_i) using Algorithm 2 (Binning and Shannon Entropy):
      - Discretize activations A_i by binning.
      - Compute probability p for each bin.
      - Compute H_i = - sum(p * log(p))
   b. Add H_i to H.

4. For each pair of neurons (i, j) from 1 to N:
   a. Compute MI(A_i, A_j) using Algorithm 3 (Kraskov estimator):
      - Combine A_i and A_j into joint samples.
      - Find k-nearest neighbor distances in the joint space.
      - Use neighbor counts in marginal spaces (A_i, A_j) within these distances.
      - Compute MI_ij using the Kraskov formula (Eq. 5).
   b. Add MI_ij to I.

5. Return H, I

The authors validate their hypotheses through extensive experiments on various tasks and models:

Heuristic Memorization:
- Semi-synthetic: Colored MNIST and Sentiment Adjectives (IMDb) datasets were modified to introduce spurious correlations with varying strength ( $\alpha$ ). Networks trained with higher $\alpha$ (more memorization) showed lower entropy and higher mutual information.
- Natural: Bias-in-Bios (gender bias) and NICO++ (image classification with context bias). Networks exhibiting more heuristic memorization (higher gender bias, reliance on image context) showed lower entropy and higher mutual information, consistent with the hypothesis.
Example-level Memorization: MNIST and IMDb datasets were used with varying levels of random label shuffling ( $\beta$ ). Networks trained with higher $\beta$ (more memorization) showed higher entropy and lower mutual information.

These findings across diverse setups support the central hypothesis: low entropy and high MI characterize heuristic memorization, while high entropy and low MI indicate example-level memorization. This is summarized in a table relating memorization type to diversity (Entropy and $MI^{-1}$ ) as $\downarrow$ and $\uparrow$ .

The paper demonstrates the practical utility of these findings for model selection. By ranking models based on their mean entropy or mean mutual information, they achieve high Kendall rank correlation coefficients ( $\tau$ ) with rankings based on traditional, task-specific generalization metrics (like validation accuracy or bias metrics). This suggests that these intrinsic information measures, which do not require labeled OOD or specialized test sets, can serve as effective proxies for generalization performance. For example, on Colored MNIST and Shuffled MNIST, mean entropy and mean MI showed near-perfect correlation ( $\tau=1.00$ ) with validation accuracy.

The authors discuss limitations, including the comparative nature of the current observations (needing a reference point or expectation about memorization types) and potential challenges in scaling these measures to extremely large models or comparing models with vastly different architectures/capacities. Future work could explore using these measures for Out-of-Distribution (OOD) detection or as regularization terms during training to encourage better generalization.

In summary, the paper presents a novel approach to understanding and evaluating neural network generalization based on the diversity of internal neural activations, quantified using information-theoretic measures (entropy and mutual information). The empirical results strongly support the idea that distinct patterns in these measures reflect different types of memorization, providing a promising direction for intrinsic model evaluation and potentially for improving model generalization.

PDF Markdown

Related Papers

GitHub

Measures of Information Reflect Memorization Patterns

Tweets

https://twitter.com/Encoding/status/1754474899593208150