Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective (2203.08124v1)

Published 15 Mar 2022 in cs.LG and cs.CV

Abstract: We discuss methods for visualizing neural network decision boundaries and decision regions. We use these visualizations to investigate issues related to reproducibility and generalization in neural network training. We observe that changes in model architecture (and its associate inductive bias) cause visible changes in decision boundaries, while multiple runs with the same architecture yield results with strong similarities, especially in the case of wide architectures. We also use decision boundary methods to visualize double descent phenomena. We see that decision boundary reproducibility depends strongly on model width. Near the threshold of interpolation, neural network decision boundaries become fragmented into many small decision regions, and these regions are non-reproducible. Meanwhile, very narrows and very wide networks have high levels of reproducibility in their decision boundaries with relatively few decision regions. We discuss how our observations relate to the theory of double descent phenomena in convex models. Code is available at https://github.com/somepago/dbViz

Citations (56)

View on Semantic Scholar

Collections

Sign up for free to add this paper to one or more collections.

Sign Up

Summary

The paper introduces a novel visualization technique that maps decision boundaries to assess reproducibility and double descent phenomena.
It finds that wide architectures like ResNet and WideResNet exhibit high reproducibility, with optimizers such as SAM and distillation enhancing boundary stability.
Quantitative fragmentation scores demonstrate that decision region instability at the interpolation threshold drives double descent and spikes in test error.

An Analytical Overview of Neural Network Reproducibility and Double Descent

This paper presents a focused exploration of critical phenomena in neural networks: reproducibility of decision boundaries and the double descent phenomenon, utilizing innovative visualization techniques. The paper is anchored on the premise of examining decision boundaries through visualizations to probe into model reproducibility and the geometric changes associated with double descent — insights that are pertinent for advancing the theoretical understanding of neural network training dynamics.

Decision Boundary Visualization: A Methodological Approach

The authors introduce a novel methodology to visualize decision boundaries, a crucial aspect often overlooked compared to loss landscape analysis. Traditional studies on decision boundaries mostly revolve around adversarial examples, but this paper takes a broader approach by visualizing class boundaries in the plane spanned by randomly selected images, allowing insights into reproducibility and generalization across architectures. The empirical tool developed captures the critical decision surfaces near the data manifold, offering nuanced observations that are less apparent in visualizations of loss landscapes.

Reproducibility Across Architectural Variants

Investigating whether neural networks can reliably learn the same model across different initializations, the paper reveals that decision boundaries manifest strong similarities across runs, particularly within wide architectures. ResNet and WideResNet architectures illustrate high reproducibility, suggesting a correlation between model width and the consistency of decision boundaries. Moreover, distinctions in inductive biases across different model families (such as Convolutional Neural Networks vs. Vision Transformers) are quantitatively substantiated, with reproducibility scores delineating the extent of architectural impact on class regions.

The paper extends these notions to evaluate the effect of optimizers and the role of distillation in preserving decision boundaries. Sharpness-aware minimization (SAM) induces heightened reproducibility, showcasing its regularizing capability, while distillation transmits teacher model boundary characteristics to student models, albeit more pronounced in certain model architectures.

Double Descent and Fragmentation in Decision Regions

This research provides empirical evidence of the double descent phenomenon, characterized by pronounced instability in decision boundaries at the interpolation threshold. Fragmentation scores, serving as quantitative measures, reveal significant decision region fragmentation — especially in the presence of label noise — which corroborates the observed dramatic shifts in class regions and test error peaks. This chaotic fragmentation aligns with theoretical predictions of model variance spikes at the interpolation threshold, but presents a distinctive perspective on these instabilities through neural network decision boundary fragmentation.

Further analysis discounts the hypothesis that regions around mislabeled points drive double descent, highlighting model instability through fragmented oscillations as the predominant mechanism. Insights from margin analysis reveal growing decision region margins even for mislabeled points in the over-parameterized regime, emphasizing that the decline in test error is not simply a result of shrinking error bubbles around mislabeled samples.

Implications and Future Directions

The implications of this paper are multifaceted — offering a refined understanding of inductive biases in neural network architectures, elucidating the variability in model reproducibility, and advancing our grasp on the intricate behaviors underpinning double descent phenomena. The visualization framework developed paves the way for future research to dissect decision boundaries further, potentially evolving predictive analytics for generalization capabilities in various architectures.

This paper highlights several avenues for future inquiry, including deeper theoretical exploration of decision boundary dynamics across diverse architectures and training frameworks. The fragmentation insights open doors to potentially rethinking optimization strategies to mitigate model instability at critical capacity thresholds, thus harmonizing training dynamics for enhanced model robustness and generalization.

The exploration herein complements existing literature on neural network behavior, offering valuable empirical data and analytical perspectives that facilitate a richer understanding of the complex landscape of neural network training and architecture design.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (8)

GitHub

GitHub - somepago/dbViz: The official PyTorch implementation - Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective (CVPR'22). (74 stars)