Supermasks in Superposition (2006.14769v3)

Published 26 Jun 2020 in cs.LG, cs.AI, and stat.ML

Abstract: We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting. Our approach uses a randomly initialized, fixed base network and for each task finds a subnetwork (supermask) that achieves good performance. If task identity is given at test time, the correct subnetwork can be retrieved with minimal memory usage. If not provided, SupSup can infer the task using gradient-based optimization to find a linear superposition of learned supermasks which minimizes the output entropy. In practice we find that a single gradient step is often sufficient to identify the correct mask, even among 2500 tasks. We also showcase two promising extensions. First, SupSup models can be trained entirely without task identity information, as they may detect when they are uncertain about new data and allocate an additional supermask for the new training distribution. Finally the entire, growing set of supermasks can be stored in a constant-sized reservoir by implicitly storing them as attractors in a fixed-sized Hopfield network.

Authors (7)

Mitchell Wortsman (29 papers)
Vivek Ramanujan (17 papers)
Rosanne Liu (25 papers)
Aniruddha Kembhavi (79 papers)
Mohammad Rastegari (57 papers)
Jason Yosinski (31 papers)
Ali Farhadi (138 papers)

Citations (259)

View on Semantic Scholar

Summary

The paper introduces SupSup, a continual learning model that uses task-specific supermasks to prevent catastrophic forgetting without altering fixed network weights.
It employs a rapid, single gradient step for task inference, enabling efficient selection from thousands of potential tasks.
Experimental results on datasets like SplitCIFAR100 and PermutedMNIST show that SupSup outperforms traditional benchmarks in various continual learning scenarios.

An Analysis of "Supermasks in Superposition"

The paper "Supermasks in Superposition" by Wortsman et al. addresses a critical challenge in neural network-based continual learning: catastrophic forgetting. Traditional neural networks struggle with sequential task learning, often experiencing significant performance degradation on previous tasks when introduced to new ones, a phenomenon known as catastrophic forgetting. The paper introduces the concept of "Supermasks in Superposition" (SupSup), presenting a model that can sequentially learn thousands of tasks without suffering from such forgetting.

The SupSup model operates using a fixed base architecture, randomly initialized, on which task-specific subnetworks, referred to as supermasks, are applied. These supermasks are binary masks overlaying the network's weights, ensuring that only a subset of the weights is used for each task. This masking approach cleverly exploits the rich expressive power inherent in randomly initialized networks, leveraging findings from prior work that such networks can contain powerful subnetworks capable of solving complex tasks.

Key components of the SupSup model are twofold:

Task-Specific Supermasks: For each task, a supermask is learned, effectively creating a task-specific subnetwork without modifying the underlying weights. This approach inherently avoids catastrophic forgetting by maintaining the integrity of weights used for previous tasks.
Task Inference via Optimization: When task identity is not provided at inference, the model still infers which task the data belongs to by performing a rapid optimization that seeks the supermask minimizing the output entropy. Notably, this inference typically requires only a single gradient step, allowing the model to scale to identifying the correct task among thousands efficiently.

The paper solidifies several scenarios to evaluate continual learning models, categorized by task identity availability during training and testing, and whether tasks share labels. SupSup is shown to perform robustly across this taxonomy, including scenarios where task identities are hidden both during training and testing. In these cases, SupSup can detect when data does not match any known task, allocate a new supermask, and effectively learn the new task.

Moreover, the paper explores the potential for storing supermasks as attractors in a Hopfield network, thus maintaining a growing set of task-specific supermasks within a bounded memory footprint. This method holds additional promise for computationally efficient implementations in resource-constrained environments.

The experimental evaluation highlights SupSup's efficacy across various datasets, such as SplitCIFAR100, SplitImageNet, PermutedMNIST, and RotatedMNIST. In particular, the model competes favorably against and frequently surpasses existing benchmarks in scenarios with known task identities (GG scenarios) and more challenging settings where task identities are inferred (GNu and NNs scenarios). The ability to allocate and derive new supermasks dynamically suggests a significant advantage over approaches requiring detailed task information or direct weight modifications.

The result implications are vast, suggesting practical applications in settings requiring robust performance across numerous tasks without extensive retraining or prior task data. Theoretically, this research proposes fresh insight into leveraging random networks, suggesting that network architecture design could focus more on efficient mask training rather than network re-training. This perspective could influence future AI systems, emphasizing structural adaptability and memory efficiency over conventional, resource-heavy learning processes.

Future extensions might explore self-supervised learning techniques to improve task inference accuracy further, manage finer-grained task distinctions, or completely automate the restructuring of overlaid task-specific networks. Another promising avenue is exploring non-vision domains or examining how SupSup could adapt or integrate seamlessly with reinforcement learning paradigms.

In summary, the paper "Supermasks in Superposition" offers compelling contributions to continual learning research, effectively navigating task multiplicity without succumbing to catastrophic forgetting. It paves the way for more efficient, adaptive neural network systems capable of co-existing with a more considerable array of tasks than current methodologies allow.

PDF Markdown

Supermasks in Superposition (2006.14769v3)

Summary

An Analysis of "Supermasks in Superposition"

Related Papers