Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Elastic Multi-Gradient Descent for Parallel Continual Learning (2401.01054v1)

Published 2 Jan 2024 in cs.LG and cs.AI

Abstract: The goal of Continual Learning (CL) is to continuously learn from new data streams and accomplish the corresponding tasks. Previously studied CL assumes that data are given in sequence nose-to-tail for different tasks, thus indeed belonging to Serial Continual Learning (SCL). This paper studies the novel paradigm of Parallel Continual Learning (PCL) in dynamic multi-task scenarios, where a diverse set of tasks is encountered at different time points. PCL presents challenges due to the training of an unspecified number of tasks with varying learning progress, leading to the difficulty of guaranteeing effective model updates for all encountered tasks. In our previous conference work, we focused on measuring and reducing the discrepancy among gradients in a multi-objective optimization problem, which, however, may still contain negative transfers in every model update. To address this issue, in the dynamic multi-objective optimization problem, we introduce task-specific elastic factors to adjust the descent direction towards the Pareto front. The proposed method, called Elastic Multi-Gradient Descent (EMGD), ensures that each update follows an appropriate Pareto descent direction, minimizing any negative impact on previously learned tasks. To balance the training between old and new tasks, we also propose a memory editing mechanism guided by the gradient computed using EMGD. This editing process updates the stored data points, reducing interference in the Pareto descent direction from previous tasks. Experiments on public datasets validate the effectiveness of our EMGD in the PCL setting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. Memory aware synapses: Learning what (not) to forget. In ECCV, 2018.
  2. Online continual learning with maximal interfered retrieval. In NeurIPS, 2019.
  3. Gradient based sample selection for online continual learning. In NeurIPS, 2019.
  4. A framework for learning predictive structures from multiple tasks and unlabeled data. JMLR, 6(11):1817–1853, 2005.
  5. Convex multi-task feature learning. Machine learning, 73(3):243–272, 2008.
  6. Pseudo-rehearsal: Achieving deep reinforcement learning without catastrophic forgetting. arXiv preprint arXiv:1812.02464, 2018.
  7. Models and issues in data stream systems. In SIGMOD-SIGACT-SIGART, 2002.
  8. Rainbow memory: Continual learning with a memory of diverse samples. In CVPR, 2021.
  9. Pathways: Asynchronous distributed dataflow for ml. In MLSys, 2022.
  10. A. Barzilai and K. Crammer. Convex multi-task learning by clustering. In AISTATS, 2015.
  11. Multi-task gaussian process prediction. In NeurIPS, 2007.
  12. Dark experience for general continual learning: a strong, simple baseline. In NeurIPS, 2020.
  13. Y. Censor. Pareto optimality in multiobjective problems. Applied Mathematics and Optimization, 4(1):41–59, 1977.
  14. K. Chai. Generalization errors and learning curves for regression with multi-task gaussian processes. In NeurIPS, 2009.
  15. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In ECCV, 2018.
  16. Using hindsight to anchor past knowledge in continual learning. In AAAI, 2021.
  17. Efficient lifelong learning with a-gem. In ICLR, 2019.
  18. On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486, 2019.
  19. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In ICML, 2018.
  20. Just pick a sign: Optimizing deep multitask models with gradient sign dropout. In NeurIPS, 2020.
  21. Continual learning: A comparative study on how to defy forgetting in classification tasks. arXiv preprint arXiv:1909.08383, 2019.
  22. A continual learning survey: Defying forgetting in classification tasks. IEEE TPAMI, 44(7):3366–3385, 2021.
  23. J.-A. Désidéri. Multiple-gradient descent algorithm (mgda) for multiobjective optimization. Comptes rendus mathematique, 350(5):313–318, 2012.
  24. Learning without memorizing. In CVPR, 2019.
  25. C. Doersch and A. Zisserman. Multi-task self-supervised visual learning. In ICCV, 2017.
  26. Dytox: Transformers for continual learning with dynamic token expansion. In CVPR, 2022.
  27. Class-incremental lifelong learning in multi-label classification. arXiv preprint arXiv:2207.07840, 2022.
  28. Agcn: augmented graph convolutional network for lifelong multi-label image recognition. In ICME, 2022.
  29. Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734, 2017.
  30. J. Fliege and B. F. Svaiter. Steepest descent methods for multicriteria optimization. Mathematical methods of operations research, 51:479–494, 2000.
  31. Multi-task, multi-domain learning: application to semantic segmentation and pose regression. Neurocomputing, 251:68–80, 2017.
  32. R. M. French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135, 1999.
  33. Multi-loss weighting with coefficient of variations. In WACV, 2021.
  34. Learning with long-term remembering: Following the lead of mixed stochastic gradient. arXiv preprint arXiv:1909.11763, 2019.
  35. Deep residual learning for image recognition. In CVPR, 2016.
  36. A dirty model for multi-task learning. In NeurIPS, pages 964–972, 2010.
  37. Overcoming catastrophic forgetting in neural networks. PNAS, 114(13):3521–3526, 2017.
  38. Adaptive multi-task lasso: with application to eqtl detection. In NeurIPS, 2010.
  39. Gshard: Scaling giant models with conditional computation and automatic sharding. arXiv preprint arXiv:2006.16668, 2020.
  40. Z. Li and D. Hoiem. Learning without forgetting. IEEE TPAMI, 40(12):2935–2947, 2017.
  41. X. Liao and L. Carin. Radial basis function network for multi-task learning. In NeurIPS, pages 792–802, 2005.
  42. A closer look at loss weighting in multi-task learning. arXiv preprint arXiv:2111.10603, 2021.
  43. Pareto multi-task learning. In NeurIPS, 2019.
  44. Multi-task feature learning via efficient l2, 1-norm minimization. In UAI, 2012.
  45. Towards impartial multi-task learning. In ICLR, 2021.
  46. End-to-end multi-task learning with attention. In CVPR, 2019.
  47. Multi-task deep visual-semantic embedding for video thumbnail selection. In CVPR, 2015.
  48. Multi-task deep neural networks for natural language understanding. arXiv preprint arXiv:1901.11504, 2019.
  49. Generative feature replay for class-incremental learning. In CVPR Workshop, 2020.
  50. D. Lopez-Paz and M. Ranzato. Gradient episodic memory for continual learning. In NeurIPS, 2017.
  51. Measuring asymmetric gradient discrepancy in parallel continual learning. In ICCV, 2023.
  52. Multi-domain multi-task rehearsal for lifelong learning. In AAAI, 2021.
  53. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In ACM SIGKDD, 2018.
  54. Piggyback: Adapting a single network to multiple tasks by learning to mask weights. In ECCV, 2018.
  55. Sparse coding for multitask and transfer learning. In ICML, 2013.
  56. H. Nam and B. Han. Learning multi-domain convolutional neural networks for visual tracking. In CVPR, 2016.
  57. N. Peng and M. Dredze. Multi-task domain adaptation for sequence tagging. In Workshop on Representation Learning for NLP, 2017.
  58. Pseudo-rehearsal for continual learning with normalizing flows. arXiv preprint arXiv:2007.02443, 2020.
  59. Gdumb: A simple approach that questions our progress in continual learning. In ECCV, 2020.
  60. Efficient parametrization of multi-domain deep neural networks. In CVPR, 2018.
  61. icarl: Incremental classifier and representation learning. In CVPR, 2017.
  62. Gradient-based editing of memory examples for online task-free continual learning. In NeurIPS, 2021.
  63. A. Robins. Catastrophic forgetting, rehearsal and pseudorehearsal. Connection science, 7(2):123–146, 1995.
  64. A. Rosenfeld and J. K. Tsotsos. Incremental learning through deep adaptation. IEEE TPAMI, 42(3):651–663, 2018.
  65. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
  66. O. Sener and V. Koltun. Multi-task learning as multi-objective optimization. In NeurIPS, 2018.
  67. Overcoming catastrophic forgetting with hard attention to the task. In ICML, 2018.
  68. Distillation techniques for pseudo-rehearsal based incremental learning. arXiv preprint arXiv:1807.02799, 2018.
  69. Inductive transfer with context-sensitive neural networks. Machine Learning, 73(3):313–336, 2008.
  70. H. Tang and K. Jia. Discriminative adversarial domain adaptation. In AAAI, 2020.
  71. S. Tars and M. Fishel. Multi-domain neural machine translation. arXiv preprint arXiv:1805.02282, 2018.
  72. S. Thrun and J. O’Sullivan. Discovering structure in multiple learning tasks: The tc algorithm. In ICML, 1996.
  73. L. Van der Maaten and G. Hinton. Visualizing data using t-sne. JMLR, 9:2579–2605, 2008.
  74. J. Wang and J. Ye. Safe screening for multi-task feature learning with multiple data matrices. In ICML, 2015.
  75. Memory replay gans: Learning to generate new categories without forgetting. In NeurIPS, 2018.
  76. Y. Yang and T. M. Hospedales. A unified perspective on multi-domain and multi-task learning. In ICLR, 2014.
  77. Lifelong learning with dynamically expandable networks. arXiv preprint arXiv:1708.01547, 2017.
  78. Learning gaussian processes from multiple tasks. In ICML, 2005.
  79. Gradient surgery for multi-task learning. In NeurIPS, 2020.
  80. Continual learning through synaptic intelligence. In ICML, 2017.
  81. Share or not? learning to schedule language-specific capacity for multilingual translation. In ICLR, 2020.
  82. Learning multiple related tasks using latent independent component analysis. In NeurIPS, 2005.
  83. Overcoming negative transfer: A survey. arXiv preprint arXiv:2009.00909, 2020.
  84. Y. Zhang and Q. Yang. A survey on multi-task learning. IEEE TKDE, 34(12):5586–5609, 2021.
  85. Recommending what video to watch next: a multitask ranking system. In ACM CRS, 2019.
Citations (2)

Summary

We haven't generated a summary for this paper yet.