Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Maslow's Hammer for Catastrophic Forgetting: Node Re-Use vs Node Activation (2205.09029v1)

Published 18 May 2022 in stat.ML and cs.LG

Abstract: Continual learning - learning new tasks in sequence while maintaining performance on old tasks - remains particularly challenging for artificial neural networks. Surprisingly, the amount of forgetting does not increase with the dissimilarity between the learned tasks, but appears to be worst in an intermediate similarity regime. In this paper we theoretically analyse both a synthetic teacher-student framework and a real data setup to provide an explanation of this phenomenon that we name Maslow's hammer hypothesis. Our analysis reveals the presence of a trade-off between node activation and node re-use that results in worst forgetting in the intermediate regime. Using this understanding we reinterpret popular algorithmic interventions for catastrophic interference in terms of this trade-off, and identify the regimes in which they are most effective.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sebastian Lee (7 papers)
  2. Stefano Sarao Mannelli (21 papers)
  3. Claudia Clopath (24 papers)
  4. Sebastian Goldt (33 papers)
  5. Andrew Saxe (20 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.