Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization (2105.05612v3)

Published 12 May 2021 in cs.LG and cs.CV

Abstract: Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features and can ignore complex, equally-predictive ones. This simplicity bias can explain their lack of robustness out of distribution (OOD). The more complex the task to learn, the more likely it is that statistical artifacts (i.e. selection biases, spurious correlations) are simpler than the mechanisms to learn. We demonstrate that the simplicity bias can be mitigated and OOD generalization improved. We train a set of similar models to fit the data in different ways using a penalty on the alignment of their input gradients. We show theoretically and empirically that this induces the learning of more complex predictive patterns. OOD generalization fundamentally requires information beyond i.i.d. examples, such as multiple training environments, counterfactual examples, or other side information. Our approach shows that we can defer this requirement to an independent model selection stage. We obtain SOTA results in visual recognition on biased data and generalization across visual domains. The method - the first to evade the simplicity bias - highlights the need for a better understanding and control of inductive biases in deep learning.

PDF Abstract

Evading the Simplicity Bias in Neural Networks: Improvements in Out-of-Distribution Generalization

The paper "Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization" by Teney et al. addresses a fundamental issue in neural network training — a phenomenon known as the simplicity bias, where models tend to favor simpler features over more complex ones that may be equally predictive. This inclination can result in suboptimal generalization capabilities, especially in out-of-distribution (OOD) settings. The authors propose a method to alleviate this bias by training a collection of models with a regularizer that promotes diversity among the models, a strategy that can lead to substantial improvements in OOD generalization.

Objective and Methodology

The core objective of the paper is to enhance OOD generalization by mitigating simplicity bias in neural networks. The authors postulate that the simplicity bias limits the models to exploit only simple, often spurious, correlations available in training data, thus failing to learn the complex patterns that align with the task's intrinsic mechanisms. This paper proposes training a set of models concurrently while applying a diversity regularizer that penalizes the alignment of input gradients between models. This approach encourages the exploration of more complex and varying predictive patterns across different models.

Theoretical Insights and Experimental Evaluation

The authors present both theoretical and empirical evidence showing that this framework indeed fosters diverse models that uncover a broader spectrum of predictive features. The theoretical analysis emphasizes that the simplicity bias aligns neural networks toward piecewise linear functions favored during training with gradient-based optimizers like SGD. By altering model similarity metrics using penalized gradient alignments, more diversified, potentially complex patterns are discovered and utilized.

The strength of the proposed method is evidenced through rigorous testing on multiple challenging datasets: multi-dataset collages, biased activity recognition (BAR), and the PACS domain generalization benchmark. In the collages, standard models are biased to learn from the MNIST block in collages, ignoring others. The proposed method, however, identifies models that capture signals from less obvious image regions. In BAR, the models achieve better generalization by focusing more on complex representations involving both person and contextual features rather than defaulting to simple associations, such as backgrounds offering predictive clues to the target action. Success is further demonstrated on PACS, where the proposed method helps identify models that generalize across unobserved styles, significantly enhancing OOD performance.

Implications and Future Directions

The implications of this research are manifold. Practically, the ability to mitigate the simplicity bias expands the applicability of neural networks to environments with distribution shifts, a challenge in many real-world scenarios, from autonomous driving in unfamiliar locales to medical diagnosis across different populations. Theoretically, this work accentuates the critical role of understanding and controlling inductive biases to extend the effectiveness of deep learning applications.

Future development could potentially refine this methodology by optimizing the balance between the number of models and the effective exploration of the hypothesis space. Research could also delve into integrating the diversity regularizer with full-scale model architectures, bridging the gap between practical computational efficiency and maximized model performance across varying tasks and domains.

In conclusion, Teney et al.'s work proposes compelling evidence for addressing the simplicity bias with innovative training methodologies, setting a foundation to further the capabilities of neural network models in diverse and dynamic environments. This research underscores the necessity of apprehending inductive biases to transcend the existing limitations of deep learning frameworks.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Damien Teney (43 papers)
Ehsan Abbasnejad (59 papers)
Simon Lucey (107 papers)
Anton van den Hengel (188 papers)

Citations (78)

View on Semantic Scholar

Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization (2105.05612v3)

Evading the Simplicity Bias in Neural Networks: Improvements in Out-of-Distribution Generalization

Objective and Methodology

Theoretical Insights and Experimental Evaluation

Implications and Future Directions

Related Papers

GitHub

YouTube