Deep Ensembles: A Loss Landscape Perspective (1912.02757v2)

Published 5 Dec 2019 in stat.ML and cs.LG

Abstract: Deep ensembles have been empirically shown to be a promising approach for improving accuracy, uncertainty and out-of-distribution robustness of deep learning models. While deep ensembles were theoretically motivated by the bootstrap, non-bootstrap ensembles trained with just random initialization also perform well in practice, which suggests that there could be other explanations for why deep ensembles work well. Bayesian neural networks, which learn distributions over the parameters of the network, are theoretically well-motivated by Bayesian principles, but do not perform as well as deep ensembles in practice, particularly under dataset shift. One possible explanation for this gap between theory and practice is that popular scalable variational Bayesian methods tend to focus on a single mode, whereas deep ensembles tend to explore diverse modes in function space. We investigate this hypothesis by building on recent work on understanding the loss landscape of neural networks and adding our own exploration to measure the similarity of functions in the space of predictions. Our results show that random initializations explore entirely different modes, while functions along an optimization trajectory or sampled from the subspace thereof cluster within a single mode predictions-wise, while often deviating significantly in the weight space. Developing the concept of the diversity--accuracy plane, we show that the decorrelation power of random initializations is unmatched by popular subspace sampling methods. Finally, we evaluate the relative effects of ensembling, subspace based methods and ensembles of subspace based methods, and the experimental results validate our hypothesis.

Authors (3)

Stanislav Fort (30 papers)
Huiyi Hu (14 papers)
Balaji Lakshminarayanan (62 papers)

Citations (567)

View on Semantic Scholar

Summary

Deep Ensembles: A Loss Landscape Perspective

The paper "Deep Ensembles: A Loss Landscape Perspective" provides a comprehensive analysis of deep ensembles through the lens of loss landscapes in neural networks. The authors present a critical examination of why deep ensembles outperform Bayesian Neural Networks (BNNs) and offer an empirical investigation into the diversity of solutions sampled by various ensemble methods.

Summary of Key Contributions

The paper primarily investigates the hypothesis that deep ensembles explore different modes in function space, contrary to popular scalable variational Bayesian methods that tend to focus on a single mode. This exploration into diverse modes is suggested to be a critical factor contributing to the success of deep ensembles over BNNs, especially under conditions of dataset shift.

Key contributions of the paper include:

Empirical Comparison of Ensemble Methods: The authors compared deep ensembles with models utilizing subspace sampling methods, such as dropout and Gaussian approximations. The empirical evidence demonstrates that random initialization leads to a broader exploration of function space, thus enhancing performance.
Diversity vs. Accuracy: By introducing the concept of a diversity-accuracy plane, the paper quantifies the trade-off between diversity and accuracy in deep ensemble models. Independent initialization achieved a superior balance, suggesting that randomness in initialization is crucial for performance gains.
Mode Connectivity Exploration: Using loss landscape analysis, the paper visualizes how ensembles explore different modes characterized by the diversity of functions despite similar accuracies.
Complementary Benefits: The exploration indicated that ensemble methods combined with subspace sampling provide additional benefits in accuracy and uncertainty estimation.

Experimental Validation

The authors employed a robust experimental setup involving CIFAR-10, CIFAR-100, and ImageNet datasets, leveraging various architectures including ResNet variants. Key results validated their hypothesis:

Mode Diversity: Deep ensembles, through random initialization, explored different and diverse modes, confirmed via t-SNE plots and function space measurements.
Unmatched Diversity: Randomly initialized ensembles showcased diversity in predictions not achievable by current variational methods, underscoring the limitations of subspace sampling in approximating posterior distributions.
Better Trade-offs: Ensemble models showed improved trade-offs in diversity vs. accuracy plots, further cemented by superior performance under dataset shift scenarios such as CIFAR-10-C and ImageNet-C benchmarks.

Implications and Future Work

The findings present significant implications for designing ensemble methods and Bayesian approaches. Deep ensembles' ability to explore diverse function modes suggests that explicitly integrating diversity-focused strategies could bridge gaps between theory and empirical performance.

Future Developments:

Enhanced Diversity Methods: Developing algorithms that intrinsically account for function space diversity beyond random initializations could advance ensemble methods.
Parameter-Efficiency: Creating parameter-efficient methodologies while maintaining diversity could yield more resource-conscious models.
Robustness to Dataset Shift: Further research into handling dataset shift through diversified ensembles could lead to models with improved generalization under varying conditions.

In conclusion, this paper presents a rigorous exploration into why deep ensembles are effective. By leveraging insights from loss landscapes, it reframes our understanding of ensemble efficacy in neural networks and sets the stage for developing more robust, theoretically grounded models.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos