The Joint Effect of Task Similarity and Overparameterization on Catastrophic Forgetting -- An Analytical Model (2401.12617v2)

Published 23 Jan 2024 in cs.LG

Abstract: In continual learning, catastrophic forgetting is affected by multiple aspects of the tasks. Previous works have analyzed separately how forgetting is affected by either task similarity or overparameterization. In contrast, our paper examines how task similarity and overparameterization jointly affect forgetting in an analyzable model. Specifically, we focus on two-task continual linear regression, where the second task is a random orthogonal transformation of an arbitrary first task (an abstraction of random permutation tasks). We derive an exact analytical expression for the expected forgetting - and uncover a nuanced pattern. In highly overparameterized models, intermediate task similarity causes the most forgetting. However, near the interpolation threshold, forgetting decreases monotonically with the expected task similarity. We validate our findings with linear regression on synthetic data, and with neural networks on established permutation task benchmarks.

Citations (6)

View on Semantic Scholar

Summary

The paper derives an analytical expression for worst-case catastrophic forgetting in two-task linear regression models.
It reveals a non-monotonic relationship between task similarity and forgetting when models are highly overparameterized.
Empirical tests using synthetic data and neural networks confirm the model's predictions, suggesting design strategies for continual learning.

Analytical Insights into Catastrophic Forgetting

Catastrophic Forgetting in Linear Regression Models

The phenomenon of catastrophic forgetting is a significant challenge in continual learning scenarios, where a model is trained sequentially on multiple tasks. In the context of this paper, the authors concentrate on a two-task continual linear regression model, deriving a precise analytical expression for worst-case forgetting. The expression is contingent on the dimensionality of the transformed subspace, which significantly influences task similarity. By setting the second task as a transformation of the first, using a random orthogonal operator, the model offers a surrogate for the widely used permutation tests in existing literature.

Overparameterization and Task Similarity Interplay

The interplay between overparameterization and task similarity reveals pivotal behavior influencing forgetting. It is observed that when the model is highly overparameterized, the relationship between task similarity and catastrophic forgetting is non-monotonic. Intermediate task similarity causes the most significant forgetting, but as models become critically parameterized, the relationship turns monotonic. This change in behavior demonstrates the complexity in drawing correlations between task difficulty and parameters like model capacity and task resemblance.

Empirical Validation

The theoretical results present a profound prediction of forgetting behavior in linear regression models that entail a joint influence of task similarity through Dimensionality of Transformed Subspace (DOTS) and overparameterization. To corroborate these findings, the authors engage in empirical testing with synthetic data and linear models. They observe that highly overparameterized models endure a non-monotonic shift in behavior, where intermediate task similarity is most challenging. Conversely, models with lower levels of overparameterization exhibit less pronounced non-monotonic behavior, signifying equal difficulty for intermediate and highly dissimilar tasks.

Implications and Considerations for Neural Networks

The analytical findings are extended beyond synthetic data and linear regression models to include experiments employing neural networks in a permuted image setting. The results of these experiments align with the theoretical predictions, demonstrating the interaction between task similarity and overparameterization in neural network-based continual learning frameworks. This indicates a potential transferability of insights from linear regression models to more complex architectures, suggesting avenues for further research in mitigating catastrophic forgetting through model design and training regimen adaptations.

PDF Markdown

Related Papers

Tweets

https://twitter.com/itayevron/status/1750143824230293725