Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Second-Order Perspective on Model Compositionality and Incremental Learning (2405.16350v2)

Published 25 May 2024 in cs.AI and cs.LG

Abstract: The fine-tuning of deep pre-trained models has revealed compositional properties, with multiple specialized modules that can be arbitrarily composed into a single, multi-task model. However, identifying the conditions that promote compositionality remains an open issue, with recent efforts concentrating mainly on linearized networks. We conduct a theoretical study that attempts to demystify compositionality in standard non-linear networks through the second-order Taylor approximation of the loss function. The proposed formulation highlights the importance of staying within the pre-training basin to achieve composable modules. Moreover, it provides the basis for two dual incremental training algorithms: the one from the perspective of multiple models trained individually, while the other aims to optimize the composed model as a whole. We probe their application in incremental classification tasks and highlight some valuable skills. In fact, the pool of incrementally learned modules not only supports the creation of an effective multi-task model but also enables unlearning and specialization in certain tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Conditional channel gated networks for task-aware continual learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2020.
  2. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision, 2018.
  3. Online continual learning with maximal interfered retrieval. In Advances in Neural Information Processing Systems, 2019.
  4. Gradient based sample selection for online continual learning. In Advances in Neural Information Processing Systems, 2019.
  5. Does combining parameter-efficient modules improve few-shot transfer accuracy? arXiv preprint arXiv:2402.15414, 2024.
  6. Lora learns less and forgets less. arXiv preprint arXiv:2405.09673, 2024.
  7. A-la-carte prompt tuning (apt): Combining distinct data via composable prompting. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2023.
  8. Mammoth - an extendible (general) continual learning framework for pytorch, 2020.
  9. Dark experience for general continual learning: a strong, simple baseline. Advances in Neural Information Processing Systems, 2020.
  10. Rethinking Experience Replay: a Bag of Tricks for Continual Learning. In ICPR, 2020.
  11. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European conference on computer vision (ECCV), pages 532–547, 2018.
  12. Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE, 105(10):1865–1883, 2017.
  13. Large scale fine-grained categorization and domain-specific transfer learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018.
  14. Large scale fine-grained categorization and domain-specific transfer learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 4109–4118, 2018.
  15. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  16. Thomas George. Nngeometry: easy and fast fisher information matrices and neural tangent kernels in pytorch. 2020.
  17. Thomas George. NNGeometry: Easy and Fast Fisher Information Matrices and Neural Tangent Kernels in PyTorch, 2021.
  18. Caltech-256 object category dataset. CalTech Report, 2007.
  19. The many faces of robustness: A critical analysis of out-of-distribution generalization. IEEE International Conference on Computer Vision, 2021.
  20. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  21. Lorahub: Efficient cross-task generalization via dynamic lora composition. arXiv preprint arXiv:2307.13269, 2023.
  22. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv preprint arXiv:1511.08060, 2015.
  23. Ferenc Huszár. Note on the quadratic penalties in elastic weight consolidation. Proceedings of the National Academy of Sciences, 115(11):E2496–E2497, 2018.
  24. Editing models with task arithmetic. arXiv preprint arXiv:2212.04089, 2022.
  25. Joint training of deep ensembles fails due to learner collusion. Advances in Neural Information Processing Systems, 2024.
  26. Dataless knowledge fusion by merging weights of language models. arXiv preprint arXiv:2212.09849, 2022.
  27. Population parameter averaging (papa). arXiv preprint arXiv:2304.03094, 2023.
  28. Continual learning of a mixed sequence of similar and dissimilar tasks. In Advances in Neural Information Processing Systems, 2020.
  29. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
  30. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, 2009.
  31. Limitations of the empirical fisher approximation for natural gradient descent. Advances in neural information processing systems, 32, 2019.
  32. John M Lee. Riemannian manifolds: an introduction to curvature, volume 176. Springer Science & Business Media, 2006.
  33. Trainable weight averaging: Efficient training by optimizing historical solutions. In The Eleventh International Conference on Learning Representations, 2022.
  34. Trainable weight averaging: A general approach for subspace training. arXiv preprint arXiv:2205.13104, 2023.
  35. Deep model fusion: A survey. arXiv preprint arXiv:2309.15698, 2023.
  36. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950–1965, 2022.
  37. Tangent model composition for ensembling and continual fine-tuning. In IEEE International Conference on Computer Vision, 2023.
  38. Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017.
  39. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  40. Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018.
  41. James Martens. New insights and perspectives on the natural gradient method. Journal of Machine Learning Research, 21(146):1–76, 2020.
  42. Merging models with fisher-weighted averaging. Advances in Neural Information Processing Systems, 2022.
  43. Understanding cross-domain few-shot learning based on domain similarity and few-shot difficulty. Advances in Neural Information Processing Systems, 35:2622–2636, 2022.
  44. Task arithmetic in the tangent space: Improved editing of pre-trained models. Advances in Neural Information Processing Systems, 2024.
  45. Prompt algebra for task composition. arXiv preprint arXiv:2306.00310, 2023.
  46. Modular deep learning. arXiv preprint arXiv:2302.11529, 2023.
  47. Recognizing indoor scenes. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. IEEE, 2009.
  48. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  49. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017.
  50. Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972, 2021.
  51. Online structured laplace approximations for overcoming catastrophic forgetting. Advances in Neural Information Processing Systems, 31, 2018.
  52. Divide and not forget: Ensemble of selectively trained experts in continual learning. In The Twelfth International Conference on Learning Representations, 2024.
  53. To stay or not to stay in the pre-train basin: Insights on ensembling in transfer learning. Advances in Neural Information Processing Systems, 2024.
  54. Progress & compress: A scalable framework for continual learning. In International conference on machine learning, pages 4528–4537. PMLR, 2018.
  55. Overcoming Catastrophic Forgetting with Hard Attention to the Task. In International Conference on Machine Learning, 2018.
  56. Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2023.
  57. Zipit! merging models from different tasks without training. arXiv preprint arXiv:2305.03053, 2023.
  58. Gido M Van de Ven and Andreas S Tolias. Three scenarios for continual learning. arXiv preprint arXiv:1904.07734, 2019.
  59. The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
  60. Dualprompt: Complementary prompting for rehearsal-free continual learning. In European Conference on Computer Vision, 2022.
  61. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139–149, 2022.
  62. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning. PMLR, 2022.
  63. Ties-merging: Resolving interference when merging models. Advances in Neural Information Processing Systems, 2024.
  64. Continual learning through synaptic intelligence. In International Conference on Machine Learning, 2017.
  65. Learning useful representations for shifting tasks and distributions. In International Conference on Machine Learning. PMLR, 2023.
  66. Composing parameter-efficient modules with arithmetic operation. Advances in Neural Information Processing Systems, 2024.
  67. Preventing zero-shot transfer degradation in continual learning of vision-language models. In IEEE International Conference on Computer Vision, 2023.
  68. A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1):43–76, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Angelo Porrello (32 papers)
  2. Lorenzo Bonicelli (13 papers)
  3. Pietro Buzzega (11 papers)
  4. Monica Millunzi (2 papers)
  5. Simone Calderara (64 papers)
  6. Rita Cucchiara (142 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets