Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 167 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 106 tok/s Pro
Kimi K2 187 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Overcoming Catastrophic Forgetting by Exemplar Selection in Task-oriented Dialogue System (2405.10992v1)

Published 16 May 2024 in cs.LG and cs.AI

Abstract: Intelligent task-oriented dialogue systems (ToDs) are expected to continuously acquire new knowledge, also known as Continual Learning (CL), which is crucial to fit ever-changing user needs. However, catastrophic forgetting dramatically degrades the model performance in face of a long streamed curriculum. In this paper, we aim to overcome the forgetting problem in ToDs and propose a method (HESIT) with hyper-gradient-based exemplar strategy, which samples influential exemplars for periodic retraining. Instead of unilaterally observing data or models, HESIT adopts a profound exemplar selection strategy that considers the general performance of the trained model when selecting exemplars for each task domain. Specifically, HESIT analyzes the training data influence by tracing their hyper-gradient in the optimization process. Furthermore, HESIT avoids estimating Hessian to make it compatible for ToDs with a large pre-trained model. Experimental results show that HESIT effectively alleviates catastrophic forgetting by exemplar selection, and achieves state-of-the-art performance on the largest CL benchmark of ToDs in terms of all metrics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Second-order stochastic optimization for machine learning in linear time. The Journal of Machine Learning Research, 18(1):4148–4187.
  2. Gradient based sample selection for online continual learning. Advances in neural information processing systems, 32.
  3. Building a role specified open-domain dialogue system leveraging large-scale language models. arXiv preprint arXiv:2205.00176.
  4. Yoshua Bengio. 2000. Continuous optimization of hyper-parameters. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, volume 1, pages 305–310. IEEE.
  5. Continual lifelong learning in natural language processing: A survey. arXiv preprint arXiv:2012.09823.
  6. Streaming variational bayes. Advances in neural information processing systems, 26.
  7. Multiwoz–a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. arXiv preprint arXiv:1810.00278.
  8. Taskmaster-1: Toward a realistic and diverse dialog dataset. arXiv preprint arXiv:1909.05358.
  9. Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420.
  10. Continual learning with tiny episodic memories. arXiv preprint arXiv:1902.10486.
  11. Hydra: Hypergradient data relevance analysis for interpreting deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 7081–7089.
  12. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9268–9277.
  13. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44(7):3366–3385.
  14. Continual learning with transformers for image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3774–3781.
  15. Fastif: Scalable influence functions for efficient model interpretation and debugging. arXiv preprint arXiv:2012.15781.
  16. Continual learning for text classification with information disentanglement based regularization. arXiv preprint arXiv:2104.05489.
  17. David Isele and Akansel Cosgun. 2018. Selective experience replay for lifelong learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
  18. Mihir Kale and Abhinav Rastogi. 2020. Few-shot natural language generation by rewriting templates. arXiv preprint arXiv:2004.15006.
  19. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526.
  20. Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In International conference on machine learning, pages 1885–1894. PMLR.
  21. Learning multiple layers of features from tiny images.
  22. Sungjin Lee. 2017. Toward continual learning for conversational agents. arXiv preprint arXiv:1712.09943.
  23. Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting. In International Conference on Machine Learning, pages 3925–3934. PMLR.
  24. Bitod: A bilingual multi-domain dataset for task-oriented dialogue modeling. arXiv preprint arXiv:2106.02787.
  25. Continual learning in task-oriented dialogue systems. arXiv preprint arXiv:2012.15504.
  26. Online continual learning in image classification: An empirical survey. Neurocomputing, 469:28–51.
  27. Davide Maltoni and Vincenzo Lomonaco. 2019. Continuous learning in single-incremental-task scenarios. Neural Networks, 116:56–73.
  28. James Martens et al. 2010. Deep learning via hessian-free optimization. In ICML, volume 27, pages 735–742.
  29. Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pages 109–165. Elsevier.
  30. Continual learning for natural language generation in task-oriented dialog systems. arXiv preprint arXiv:2010.00910.
  31. A wholistic view of continual learning with deep neural networks: Forgotten lessons and the bridge to active and open world learning. arXiv preprint arXiv:2009.01797.
  32. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  33. Estimating training data influence by tracing gradient descent. Advances in Neural Information Processing Systems, 33:19920–19930.
  34. Lifelong sequence generation with dynamic module expansion and adaptation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6701–6714, Singapore. Association for Computational Linguistics.
  35. Lifelong event detection with embedding space separation and compaction. In 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics.
  36. Chengwei Qin and Shafiq Joty. 2022a. Continual few-shot relation learning via embedding space regularization and data augmentation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2776–2789, Dublin, Ireland. Association for Computational Linguistics.
  37. Chengwei Qin and Shafiq Joty. 2022b. LFPT5: A unified framework for lifelong few-shot language learning based on prompt tuning of t5. In International Conference on Learning Representations.
  38. Recent advances of continual learning in computer vision: An overview. arXiv preprint arXiv:2109.11369.
  39. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  40. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010.
  41. Experience replay for continual learning. Advances in Neural Information Processing Systems, 32.
  42. Progressive neural networks. arXiv preprint arXiv:1606.04671.
  43. Overcoming catastrophic forgetting with hard attention to the task. In International Conference on Machine Learning, pages 4548–4557. PMLR.
  44. Lamol: Language modeling for lifelong language learning. arXiv preprint arXiv:1909.03329.
  45. Exploring example influence in continual learning. arXiv preprint arXiv:2209.12241.
  46. Rehearsal revealed: The limits and merits of revisiting samples in continual learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9385–9394.
  47. Semantically conditioned lstm-based natural language generation for spoken dialogue systems. arXiv preprint arXiv:1508.01745.
  48. Supermasks in superposition. Advances in Neural Information Processing Systems, 33:15173–15184.
  49. Transferable multi-domain state generator for task-oriented dialogue systems. arXiv preprint arXiv:1905.08743.
  50. Reinforced continual learning. Advances in Neural Information Processing Systems, 31.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: