- The paper derives an analytical expression for worst-case catastrophic forgetting in two-task linear regression models.
- It reveals a non-monotonic relationship between task similarity and forgetting when models are highly overparameterized.
- Empirical tests using synthetic data and neural networks confirm the model's predictions, suggesting design strategies for continual learning.
Analytical Insights into Catastrophic Forgetting
Catastrophic Forgetting in Linear Regression Models
The phenomenon of catastrophic forgetting is a significant challenge in continual learning scenarios, where a model is trained sequentially on multiple tasks. In the context of this paper, the authors concentrate on a two-task continual linear regression model, deriving a precise analytical expression for worst-case forgetting. The expression is contingent on the dimensionality of the transformed subspace, which significantly influences task similarity. By setting the second task as a transformation of the first, using a random orthogonal operator, the model offers a surrogate for the widely used permutation tests in existing literature.
Overparameterization and Task Similarity Interplay
The interplay between overparameterization and task similarity reveals pivotal behavior influencing forgetting. It is observed that when the model is highly overparameterized, the relationship between task similarity and catastrophic forgetting is non-monotonic. Intermediate task similarity causes the most significant forgetting, but as models become critically parameterized, the relationship turns monotonic. This change in behavior demonstrates the complexity in drawing correlations between task difficulty and parameters like model capacity and task resemblance.
Empirical Validation
The theoretical results present a profound prediction of forgetting behavior in linear regression models that entail a joint influence of task similarity through Dimensionality of Transformed Subspace (DOTS) and overparameterization. To corroborate these findings, the authors engage in empirical testing with synthetic data and linear models. They observe that highly overparameterized models endure a non-monotonic shift in behavior, where intermediate task similarity is most challenging. Conversely, models with lower levels of overparameterization exhibit less pronounced non-monotonic behavior, signifying equal difficulty for intermediate and highly dissimilar tasks.
Implications and Considerations for Neural Networks
The analytical findings are extended beyond synthetic data and linear regression models to include experiments employing neural networks in a permuted image setting. The results of these experiments align with the theoretical predictions, demonstrating the interaction between task similarity and overparameterization in neural network-based continual learning frameworks. This indicates a potential transferability of insights from linear regression models to more complex architectures, suggesting avenues for further research in mitigating catastrophic forgetting through model design and training regimen adaptations.