Papers
Topics
Authors
Recent
2000 character limit reached

P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer (2401.11666v1)

Published 22 Jan 2024 in cs.LG and cs.AI

Abstract: Catastrophic forgetting poses a substantial challenge for managing intelligent agents controlled by a large model, causing performance degradation when these agents face new tasks. In our work, we propose a novel solution - the Progressive Prompt Decision Transformer (P2DT). This method enhances a transformer-based model by dynamically appending decision tokens during new task training, thus fostering task-specific policies. Our approach mitigates forgetting in continual and offline reinforcement learning scenarios. Moreover, P2DT leverages trajectories collected via traditional reinforcement learning from all tasks and generates new task-specific tokens during training, thereby retaining knowledge from previous studies. Preliminary results demonstrate that our model effectively alleviates catastrophic forgetting and scales well with increasing task environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. “Decision transformer: Reinforcement learning via sequence modeling,” Advances in neural information processing systems, vol. 34, pp. 15084–15097, 2021.
  2. “Multi-game decision transformers,” Advances in Neural Information Processing Systems, vol. 35, pp. 27921–27936, 2022.
  3. “Online decision transformer,” in international conference on machine learning. PMLR, 2022, pp. 27042–27059.
  4. “Embracing change: Continual learning in deep neural networks,” Trends in cognitive sciences, vol. 24, no. 12, pp. 1028–1040, 2020.
  5. “Continual lifelong learning with neural networks: A review,” Neural networks, vol. 113, pp. 54–71, 2019.
  6. “Learning without forgetting,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 12, pp. 2935–2947, 2017.
  7. “Overcoming catastrophic forgetting in neural networks,” Proceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
  8. “Fedet: a communication-efficient federated class-incremental learning framework based on enhanced transformer,” in Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023, pp. 3984–3992.
  9. “Gradient episodic memory for continual learning,” Advances in neural information processing systems, vol. 30, 2017.
  10. “icarl: Incremental classifier and representation learning,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 2001–2010.
  11. “Shoggoth: towards efficient edge-cloud collaborative real-time video inference via adaptive online learning,” in 2023 60th ACM/IEEE Design Automation Conference (DAC). IEEE, 2023, pp. 1–6.
  12. “The power of scale for parameter-efficient prompt tuning,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 3045–3059.
  13. “Prefix-tuning: Optimizing continuous prompts for generation,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021, pp. 4582–4597.
  14. “What learning systems do intelligent agents need? complementary learning systems theory updated,” Trends in cognitive sciences, vol. 20, no. 7, pp. 512–534, 2016.
  15. “Progressive prompts: Continual learning for language models,” in The Eleventh International Conference on Learning Representations, 2022.
  16. “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2012, pp. 5026–5033.
  17. “D4rl: Datasets for deep data-driven reinforcement learning,” arXiv preprint arXiv:2004.07219, 2020.
  18. “Conservative q-learning for offline reinforcement learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 1179–1191, 2020.
  19. “Stabilizing off-policy q-learning via bootstrapping error reduction,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  20. “Behavior regularized offline reinforcement learning,” arXiv preprint arXiv:1911.11361, 2019.
  21. “Advantage-weighted regression: Simple and scalable off-policy reinforcement learning,” arXiv preprint arXiv:1910.00177, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.