Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations (2204.02937v2)

Published 6 Apr 2022 in cs.LG, cs.CV, and stat.ML

Abstract: Neural network classifiers can largely rely on simple spurious features, such as backgrounds, to make predictions. However, even in these cases, we show that they still often learn core features associated with the desired attributes of the data, contrary to recent findings. Inspired by this insight, we demonstrate that simple last layer retraining can match or outperform state-of-the-art approaches on spurious correlation benchmarks, but with profoundly lower complexity and computational expenses. Moreover, we show that last layer retraining on large ImageNet-trained models can also significantly reduce reliance on background and texture information, improving robustness to covariate shift, after only minutes of training on a single GPU.

References (129)

Citations (278)

View on Semantic Scholar

Summary

The paper demonstrates that retraining only the last layer (DFR) effectively counters spurious correlations in neural networks.
It employs a small, curated reweighting dataset to recalibrate weights for core features without full model retraining.
Experiments reveal that DFR achieves state-of-the-art performance on benchmarks like Waterbirds and CelebA with minimal computational cost.

An Expert Overview of "Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations"

The paper addresses a crucial issue in neural network classifiers: the reliance on spurious features for making predictions. It provides an insightful analysis into how neural networks, even when they seem to prioritize spurious correlations, are capable of learning core features reflective of the semantic structure of data. The authors leverage this understanding to propose a method named Deep Feature Reweighting (DFR), which essentially involves retraining just the last layer of an existing model to improve robustness against spurious correlations.

Core Observations and Methodology

The foundational observation is that neural networks, despite being affected by spurious correlations, retain a significant amount of information about the core features relevant to the actual class labels. This is a departure from recent findings suggesting that networks prioritize simpler, spurious features over more complex but relevant features. The authors substantiate their findings through experiments on synthetic datasets where clear distinctions between spurious and core features can be made.

To capitalize on this insight, the authors propose DFR, which involves retraining the last linear classification layer of a pre-existing neural model using a relatively small dataset. This dataset, referred to as the reweighting dataset, is carefully curated to break the spurious correlations present in the main training data. The retraining process considers the weights assigned to the core features and adjusts them to mitigate the influence of spurious features. The simplicity of this method is its key strength, avoiding the complexity of redesigning the entire network architecture or retraining from scratch.

Numerical Results and Claims

The empirical results presented in the paper are compelling. DFR manages to match or even surpass state-of-the-art performances on benchmarks involved with spurious correlations, while demanding significantly less computational resources. For instance, it achieves noteworthy improvements on datasets like Waterbirds and CelebA, enhancing worst-group accuracy without compromising mean accuracy. The method shows state-of-the-art results in various spurious correlation benchmarks, indicating its robustness and effectiveness.

One of the more striking results of the paper is achieved by applying DFR to large models trained on ImageNet. The method effectively reduces the model’s dependency on background and texture information, yielding improved robustness to covariate shift. This is particularly demonstrated with major classification benchmarks, where retraining the last layer effectively curbed the model's background reliance in mere minutes on a single GPU.

Theoretical and Practical Implications

Theoretically, this paper challenges some established notions about neural network induction biases, particularly the focus solely on simple, spurious features. The findings imply that even when spurious features seem predominant, the network's latent space contains a rich representation of relevant data features sufficient for robust classification.

Practically, the paper's approach is profound in its simplicity and efficiency. By focusing training efforts on just the last layer, practitioners can achieve high robustness with minimal computational demand. This can democratize access to effective deep learning models, allowing smaller teams with limited resources to leverage powerful models effectively in the presence of spurious correlations.

Future Directions

The implications of this research suggest several interesting avenues for future work. First, exploring how DFR can be optimized for other forms of neural architectures could expand its applicability further. Second, the method's potential integration with transfer learning paradigms and its role in enhancing fairness by reducing model biases present fertile grounds for further exploration. Lastly, understanding the limits of this method—particularly in scenarios where spurious correlations are exceedingly strong—would benefit both theoretical research and practical application.

In summary, this paper offers a valuable contribution to the field by demonstrating how minimal retraining of deep models can yield significant improvements in robustness against spurious correlations. DFR stands as a testament to the effectiveness of innovative and resource-efficient solutions in enhancing the existing capabilities of neural networks.

PDF Markdown