Training independent subnetworks for robust prediction (2010.06610v2)

Published 13 Oct 2020 in cs.LG, cs.CV, and stat.ML

Abstract: Recent approaches to efficiently ensemble neural networks have shown that strong robustness and uncertainty performance can be achieved with a negligible gain in parameters over the original network. However, these methods still require multiple forward passes for prediction, leading to a significant computational cost. In this work, we show a surprising result: the benefits of using multiple predictions can be achieved `for free' under a single model's forward pass. In particular, we show that, using a multi-input multi-output (MIMO) configuration, one can utilize a single model's capacity to train multiple subnetworks that independently learn the task at hand. By ensembling the predictions made by the subnetworks, we improve model robustness without increasing compute. We observe a significant improvement in negative log-likelihood, accuracy, and calibration error on CIFAR10, CIFAR100, ImageNet, and their out-of-distribution variants compared to previous methods.

Citations (196)

View on Semantic Scholar

Summary

The paper presents a MIMO configuration that leverages independent subnetworks to achieve ensemble-like robust predictions with a single forward pass.
It demonstrates significant empirical improvements in accuracy, negative log-likelihood, and calibration error on datasets like CIFAR10, CIFAR100, and ImageNet.
The study reveals that over-parameterized models can naturally segregate subnetworks, paving the way for efficient and scalable neural architectures.

An Analysis of "Training Independent Subnetworks for Robust Prediction"

The paper "Training Independent Subnetworks for Robust Prediction" proposes a novel approach to enhance neural network robustness and uncertainty estimation by leveraging independent subnetworks within a single model. The approach addresses the computational inefficiencies related to ensemble methods, which typically necessitate multiple forward passes, by presenting a desegregated model capable of simultaneous predictions.

Key Contributions and Methodology

Multi-Input Multi-Output (MIMO) Configuration: The central innovation in this work is the MIMO configuration which allows for a single neural network to host multiple subnetworks. This configuration changes the input layer to accept multiple datapoints and adjusts the output layer to provide correspondingly varied predictions. Each subnetwork independently addresses the assigned task without affecting the other subnetworks' performance.
Training and Evaluation: During training, multiple inputs are sampled, and the subnetworks, each relating to its own input, learn independently. In evaluations, a single input is duplicated, permitting ensemble-like performance improvements through averaging multiple predictions made by different subnetworks. This setup efficiently harnesses the computational power of neural networks, outputting multiple predictions in a near-singular forward pass, negating additional runtime traditionally associated with ensembles.
Empirical Results: The paper presents significant empirical improvements, showing better performance in terms of negative log-likelihood, accuracy, and calibration error across datasets like CIFAR10, CIFAR100, and ImageNet. The MIMO architecture approximates the performance of traditional ensemble methods, notably Deep Ensembles, while drastically reducing computation time.
Analysis and Characteristics of Subnetworks: Beyond performance metrics, the paper explores how subnetworks distribute themselves within the model's architecture. Pre-activations in networks display sparse variance with respect to their independent inputs, suggesting a natural separation of subnetworks within the trained configuration. This separation ensures diverse predictions akin to traditional ensembles, fortifying model robustness.

Implications and Future Directions

This research holds substantial implications for developing more resource-efficient neural networks, particularly crucial in environments demanding real-time processing or with limited computational resources. The MIMO configuration extends the feasibility of deploying robust ensemble-like structures in practical applications with negligible latency increases.

From a theoretical perspective, this approach supports the hypothesis that significantly over-parameterized models can accommodate multiple independent subnetworks. It aligns with concepts like the lottery ticket hypothesis, postulating the discovery of sparse subnetworks that do not compromise on performance.

Further research might investigate the scalability of MIMO across various neural architectures and tasks, extending beyond image classification to domains such as natural language processing and reinforcement learning. Another promising line of work lies in optimizing the balance between subnetwork independence and shared representations, potentially employing techniques from shared multi-task learning.

Conclusion

The paper presented in this paper proposes a practical and theoretical advancement in the field of neural networks by leveraging the inherent capacity of over-parameterized systems through the MIMO configuration. The provision for robust predictions "for free"—that is, without computational penalties typically encountered in ensemble architectures—highlights a significant step forward in efficient model design, opening pathways to more scalable and deployable AI systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/BlackHC/status/1827058999591838114

https://twitter.com/tensorqt/status/1834686989654016027

https://twitter.com/adebtwx/status/1827078555223691482

HackerNews

Training independent subnetworks for robust prediction (9 points, 0 comments)