A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness (2106.09129v2)

Published 16 Jun 2021 in cs.LG

Abstract: Successful adoption of deep learning (DL) in the wild requires models to be: (1) compact, (2) accurate, and (3) robust to distributional shifts. Unfortunately, efforts towards simultaneously meeting these requirements have mostly been unsuccessful. This raises an important question: Is the inability to create Compact, Accurate, and Robust Deep neural networks (CARDs) fundamental? To answer this question, we perform a large-scale analysis of popular model compression techniques which uncovers several intriguing patterns. Notably, in contrast to traditional pruning approaches (e.g., fine tuning and gradual magnitude pruning), we find that "lottery ticket-style" approaches can surprisingly be used to produce CARDs, including binary-weight CARDs. Specifically, we are able to create extremely compact CARDs that, compared to their larger counterparts, have similar test accuracy and matching (or better) robustness -- simply by pruning and (optionally) quantizing. Leveraging the compactness of CARDs, we develop a simple domain-adaptive test-time ensembling approach (CARD-Decks) that uses a gating module to dynamically select appropriate CARDs from the CARD-Deck based on their spectral-similarity with test samples. The proposed approach builds a "winning hand'' of CARDs that establishes a new state-of-the-art (on RobustBench) on CIFAR-10-C accuracies (i.e., 96.8% standard and 92.75% robust) and CIFAR-100-C accuracies (80.6% standard and 71.3% robust) with better memory usage than non-compressed baselines (pretrained CARDs and CARD-Decks available at https://github.com/RobustBench/robustbench). Finally, we provide theoretical support for our empirical findings.

Citations (65)

View on Semantic Scholar

Summary

The paper demonstrates that lottery ticket-style compression techniques can produce CARDs that match or exceed the robustness and accuracy of dense models.
It empirically validates these methods on CIFAR-10 and CIFAR-100, achieving state-of-the-art performance in resource-constrained settings.
The study introduces innovative analyses like the CARD-Deck ensemble and Fourier sensitivity to dynamically adapt models to distribution shifts.

Compressing Deep Networks for Enhanced Out-Of-Distribution Robustness

The paper presented explores the intricate balance required to create deep learning models that are simultaneously compact, accurate, and robust against distributional shifts, which the authors term as CARDs. It examines existing model compression techniques and introduces a novel approach that empirically demonstrates the potential for compressed networks to match or surpass their uncompressed counterparts in terms of robustness and accuracy.

Key Contributions

Analysis of Compression Techniques: The paper investigates various pruning strategies, notably contrasting traditional approaches like fine-tuning and gradual magnitude pruning with lottery ticket-style methods. The latter, including weight and learning rate rewinding techniques, show improved potential in maintaining robustness after compression.
Lottery Ticket Approach: The authors identify that lottery ticket-style methodologies can produce CARDs efficiently. These methods find minimal but effective sub-networks early in training, which can reach or exceed the robustness and accuracy of fully dense models.
Empirical Validation: Using the CIFAR-10 and CIFAR-100 benchmarks, the authors demonstrate that certain compressed models achieve state-of-the-art performance. In practical terms, these compressed models consume less memory and are viable for deployment in resource-constrained environments, such as autonomous space missions.
Spectral Analysis: The paper provides a Fourier sensitivity analysis, elucidating how lottery ticket-style compressed models differ from their dense counterparts across various frequency perturbations. Such analyses emphasize the robustness attributed to well-chosen compression strategies.
CARD-Deck Strategy: The introduction of the domain-adaptive CARD-Deck ensemble, which dynamically selects models based on the spectral properties of incoming data, is a notable innovation. This method leverages the strengths of individual CARDs for improved performance across diverse data distribution shifts.
Theoretical Underpinnings: The work extends theoretical guarantees, suggesting approximate sparse networks exist which can match the accuracy and robustness of full models. This is supported by a function approximation view of CARDs.

Implications and Future Directions

The implications of this work are twofold. Practically, it provides a pathway to deployable deep learning models in environments with limited computational resources, a key hurdle in fields like autonomous vehicles and space exploration. Theoretically, it challenges the notion that large models are inherently robust, suggesting opportunities for efficiency and sustainability in model deployment.

The exploration of lottery ticket-style approaches opens avenues for further research in efficient training schemes and model initialization techniques. Additionally, the CARD-Deck strategy prompts investigation into sophisticated modular architectures that dynamically adjust to data characteristics, which is a promising direction for future adaptive AI systems.

In conclusion, this paper contributes to the ongoing dialogue in deep learning research surrounding model efficiency and robustness, providing insights and methodologies that challenge conventional wisdom and offer substantial practical benefits.

PDF Markdown

Related Papers

GitHub

GitHub - RobustBench/robustbench: RobustBench: a standardized adversarial robustness benchmark [NeurIPS'21 Benchmarks and Datasets Track] (610 stars)