Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize (2406.03345v2)

Published 5 Jun 2024 in cs.LG and cs.AI

Abstract: Learning representations that generalize under distribution shifts is critical for building robust machine learning models. However, despite significant efforts in recent years, algorithmic advances in this direction have been limited. In this work, we seek to understand the fundamental difficulty of out-of-distribution generalization with deep neural networks. We first empirically show that perhaps surprisingly, even allowing a neural network to explicitly fit the representations obtained from a teacher network that can generalize out-of-distribution is insufficient for the generalization of the student network. Then, by a theoretical study of two-layer ReLU networks optimized by stochastic gradient descent (SGD) under a structured feature model, we identify a fundamental yet unexplored feature learning proclivity of neural networks, feature contamination: neural networks can learn uncorrelated features together with predictive features, resulting in generalization failure under distribution shifts. Notably, this mechanism essentially differs from the prevailing narrative in the literature that attributes the generalization failure to spurious correlations. Overall, our results offer new insights into the non-linear feature learning dynamics of neural networks and highlight the necessity of considering inductive biases in out-of-distribution generalization.

Citations (2)

View on Semantic Scholar

Summary

The paper introduces feature contamination, showing that gradient updates couple predictive core features with uncorrelated background features, impairing OOD performance.
Empirical experiments demonstrate that matching teacher representations does not guarantee robust OOD generalization, emphasizing the role of non-linear inductive biases.
Theoretical analysis of two-layer ReLU networks reveals activation asymmetry and structural learning flaws, suggesting new optimization constraints to mitigate the issue.

Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize

The paper "Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize" addresses the persistent challenge of out-of-distribution (OOD) generalization in neural networks. By scrutinizing the foundational difficulties faced by deep neural networks under distribution shifts, the research introduces the concept of "feature contamination" and offers both empirical and theoretical evidence to elucidate how this phenomenon impedes OOD generalization.

Core Problem and Motivation

The ability of machine learning models to generalize effectively under distribution shifts is critical for their deployment in real-world scenarios. However, recent methods aimed at enhancing OOD generalization have exhibited limited success. Traditional empirical risk minimization (ERM) methods, despite their strong in-distribution (ID) performance, often falter when faced with distribution shifts caused by data variations not reflected in the training set.

This research investigates the fundamental factors behind OOD generalization failure, moving beyond the prevalent narrative that attributes these failures to spurious correlations. The paper posits that neural networks, even when given "good" representations that theoretically should help them generalize well OOD, still fail due to what is termed as "feature contamination."

Empirical Findings

Initially, the authors conduct empirical experiments to explore the impact of explicit good representations on OOD generalization. The experiments involve representation distillation from pre-trained models (e.g., CLIP) to randomly initialized student models. Despite the student models being trained to match the teacher models’ representations under various distribution shifts, they exhibit significant performance degradation compared to the teacher models.

This discrepancy suggests that simply optimizing representation learning objectives does not suffice for OOD generalization unless the inductive biases in the optimization process are also taken into account. Notably, spurious correlations alone cannot account for this gap, indicating the presence of deeper, structural issues in feature learning dynamics.

Theoretical Model and Results

The theoretical analysis employs a structured OOD generalization model where input features are separated into core features (predictive and consistent with the label) and background features (uncorrelated with the label). The paper critically examines the learning behavior of two-layer ReLU neural networks trained using stochastic gradient descent (SGD). The primary findings are:

Activation Asymmetry: Neurons display class-wise asymmetry in activation probabilities, driven by their tendency to accumulate positive correlations with examples from only one class.
Feature Contamination: Neurons learn a combination of core and uncorrelated background features simultaneously. The gradient updates inadvertently cause weights to project onto both correlated and background features.
Large OOD Risk: This coupling of core and background features results in substantial OOD performance degradation. Background features shift negatively in the OOD setting, leading to diminished neuron activation and reduced generalization.

Implications and Future Directions

The paper shows that feature contamination induces OOD failures even when background features are uncorrelated with the labels, a scenario yet to be sufficiently explored in existing literature. This signifies a fundamental challenge posed by the non-linear nature of neural networks.

Moreover, the authors provide evidence that linear networks do not suffer from feature contamination, implying that non-linearity in neural networks introduces this complex behavior. Thus, future algorithms should incorporate the inductive biases of non-linear models explicitly to achieve effective OOD generalization.

Speculative Future Directions:

Inductive Biases Incorporation: Developing algorithms that consider the intrinsic inductive biases of non-linear neural networks.
Optimization Constraints: Exploring constrained optimization techniques to mitigate feature contamination by focusing gradient updates on subspaces corresponding to core features.
Feature De-coupling: Investigating methods to decouple core and background features in learned representations to enhance robustness.
Pre-training Analysis: Extending the analysis to understand whether large-scale pre-training inherently mitigates feature contamination by organizing feature representations more linearly.

Conclusion

The paper provides a nuanced understanding of how neural networks fail to generalize OOD, even under scenarios designed to mitigate known failure modes. By introducing the concept of feature contamination and demonstrating its impact through both empirical and theoretical lenses, the authors pave the way for future research to address these deeper structural generalization challenges in neural networks. The findings stress the importance of incorporating and acknowledging the complex inductive biases of non-linear models in the algorithmic design for robust machine learning systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/BenCrulis/status/1924762519542935727

https://twitter.com/trzhang0116/status/1802251253080105224

https://twitter.com/LazyOp/status/1871907569822007392