Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

139 tokens/sec

GPT-4o

47 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

47 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

154

Simplicity Bias of Transformers to Learn Low Sensitivity Functions (2403.06925v1)

Published 11 Mar 2024 in cs.LG, cs.AI, cs.CL, and stat.ML

Abstract: Transformers achieve state-of-the-art accuracy and robustness across many tasks, but an understanding of the inductive biases that they have and how those biases are different from other neural network architectures remains elusive. Various neural network architectures such as fully connected networks have been found to have a simplicity bias towards simple functions of the data; one version of this simplicity bias is a spectral bias to learn simple functions in the Fourier space. In this work, we identify the notion of sensitivity of the model to random changes in the input as a notion of simplicity bias which provides a unified metric to explain the simplicity and spectral bias of transformers across different data modalities. We show that transformers have lower sensitivity than alternative architectures, such as LSTMs, MLPs and CNNs, across both vision and language tasks. We also show that low-sensitivity bias correlates with improved robustness; furthermore, it can also be used as an efficient intervention to further improve the robustness of transformers.

References (114)

Citations (4)

View on Semantic Scholar

Summary

The paper demonstrates that transformers inherently favor low-sensitivity functions, resulting in enhanced robustness against perturbations.
It introduces sensitivity as a metric to quantify simplicity bias, linking lower sensitivity to improved generalization and stability.
Experimental results show transformers outperform LSTMs, MLPs, and CNNs, with data augmentation and regularization further reducing sensitivity.

Understanding the Simplicity Bias in Transformers: A Study on Sensitivity

Introduction to Spectral Bias in Neural Networks

The paper of inductive biases inherent in neural network architectures, particularly transformers, is essential for advancing our understanding of their learning mechanisms and performance across diverse tasks. Recent research has focused on unveiling the simplicity bias in neural networks, positing that these models prefer learning simpler functions over more complex ones. Among the various notions of simplicity, spectral bias, or the preference for functions with significant components in the lower-frequency domain, has garnered attention. This bias towards simplicity is believed to contribute to the models' generalization capabilities and robustness against perturbations.

Transformers, with their state-of-the-art performance in language, vision, and other domains, present an intriguing case paper for examining the presence and implications of simplicity bias. This post explores the concept of sensitivity as a unified metric for understanding the simplicity bias in transformers. Sensitivity, which measures a model's responsiveness to input perturbations, offers insights into how transformers prioritize lower-complexity functions across various data modalities.

Sensitivity as a Measure of Simplicity

Simplicity bias in neural networks, while beneficial for generalization, poses questions about the underlying preferences of transformer architectures. We propose using sensitivity, which quantifies the change in output with respect to small variations in the input, as a metric for assessing this bias. Lower sensitivity indicates a tendency towards simpler functions, suggesting that the model relies on robust, generalizable features rather than intricate patterns that could be susceptible to noise or perturbations.

Our investigations show that transformers exhibit a pronounced low-sensitivity bias, outperforming architectures like LSTMs, MLPs, and CNNs in terms of lower sensitivity across both vision and language tasks. This indicates a strong inductive preference for functions with broader stability against input changes, aligning with the observed robustness of transformers in practice.

Implications of Low Sensitivity for Robustness

One of the most significant findings of our paper is the correlation between low sensitivity and improved robustness in transformers. By analyzing performance on corrupted datasets such as CIFAR-10-C, we demonstrate that transformers' tendency towards lower sensitivity functions translates into superior robustness against a wide range of perturbations. Furthermore, interventions aimed at explicitly reducing sensitivity, either through data augmentation or regularization, further enhance this robustness, underscoring the causal relationship between low sensitivity and the ability to withstand input distortions.

Theoretical Perspectives and Future Directions

The findings on the low sensitivity bias in transformers not only shed light on the origins of their robustness and generalization but also open avenues for future research. Theoretical inquiries could explore the mechanisms that confer such bias in transformers compared to other architectures and ascertain whether a stronger spectral bias underpins their performance. Empirically, the connection between sensitivity and other desirable properties, like resistance to spurious correlations and long-term dependency modeling in sequence data, warrants further exploration.

Moreover, the gradient-based optimization processes, known to induce their own biases in learning dynamics, present another layer of complexity in dissecting the origins and implications of simplicity bias. Understanding how different optimization strategies interact with the inherent biases of transformers could illuminate paths to enhancing their efficacy and reliability across an even broader spectrum of applications.

Conclusion

In summary, our paper provides compelling evidence for the existence of a low sensitivity bias in transformers, manifesting in their preference for simpler, more robust functions across tasks. This bias not only contributes to their outstanding performance and robustness but also offers a lens through which to examine the foundational principles guiding their learning processes. As we continue to unravel the complexities of transformers, sensitivity stands out as a key metric for dissecting their inductive biases, paving the way for models that are not only powerful but also reliable and interpretable.

PDF Markdown

Tweets

https://twitter.com/DeqingFu/status/1767388632611213763

https://twitter.com/fly51fly/status/1767668459465478545

https://twitter.com/StatMLPapers/status/1767401156215505032

https://twitter.com/arxivsanitybot/status/1767542213922246922