Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scheduled Curiosity-Deep Dyna-Q: Efficient Exploration for Dialog Policy Learning (2402.00085v2)

Published 31 Jan 2024 in cs.LG and cs.AI

Abstract: Training task-oriented dialog agents based on reinforcement learning is time-consuming and requires a large number of interactions with real users. How to grasp dialog policy within limited dialog experiences remains an obstacle that makes the agent training process less efficient. In addition, most previous frameworks start training by randomly choosing training samples, which differs from the human learning method and hurts the efficiency and stability of training. Therefore, we propose Scheduled Curiosity-Deep Dyna-Q (SC-DDQ), a curiosity-driven curriculum learning framework based on a state-of-the-art model-based reinforcement learning dialog model, Deep Dyna-Q (DDQ). Furthermore, we designed learning schedules for SC-DDQ and DDQ, respectively, following two opposite training strategies: classic curriculum learning and its reverse version. Our results show that by introducing scheduled learning and curiosity, the new framework leads to a significant improvement over the DDQ and Deep Q-learning(DQN). Surprisingly, we found that traditional curriculum learning was not always effective. Specifically, according to the experimental results, the easy-first and difficult-first strategies are more suitable for SC-DDQ and DDQ. To analyze our results, we adopted the entropy of sampled actions to depict action exploration and found that training strategies with high entropy in the first stage and low entropy in the last stage lead to better performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Pomdp-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5):1160–1179, 2013.
  2. Learning dialogue strategies within the markov decision process framework. In 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, 1997.
  3. Bbq-networks: Efficient exploration in deep reinforcement learning for task-oriented dialogue systems. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  4. Switch-based active deep dyna-q: Efficient adaptive planning for task-completion dialogue policy learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7289–7296, 2019.
  5. Recent advances and challenges in task-oriented dialog systems. Science China Technological Sciences, 63(10):2011–2027, 2020.
  6. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
  7. Deep dyna-q: Integrating planning for task-completion dialogue policy learning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018.
  8. Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems, 29, 2016.
  9. # exploration: A study of count-based exploration for deep reinforcement learning. Advances in neural information processing systems, 30, 2017.
  10. Count-based exploration via embedded state space for deep reinforcement learning. Wireless Communications and Mobile Computing, 2022, 2022.
  11. Curiosity-driven exploration by self-supervised prediction. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017.
  12. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, 2009.
  13. Competence-based curriculum learning for neural machine translation. In Proceedings of the 2019 Conference of the North, 2019.
  14. Simple and effective curriculum pointer-generator networks for reading comprehension over long narratives. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019.
  15. Curriculum learning for natural language understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020.
  16. Curriculum pre-training for end-to-end speech translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020.
  17. Scheduled dialog policy learning: An automatic curriculum learning framework for task-oriented dialog system. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021.
  18. Reinforced curriculum learning on pre-trained neural machine translation models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 9652–9659, 2020.
  19. Automatic curriculum learning with over-repetition penalty for dialogue policy learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14540–14548, 2021.
  20. Cold-started curriculum learning for task-oriented dialogue policy. In 2021 IEEE International Conference on e-Business Engineering (ICEBE), pages 100–105, 2021.
  21. Active bias: Training more accurate neural networks by emphasizing high variance samples. In Advances in Neural Information Processing Systems, volume 30, 2017.
  22. G. Hacohen and D. Weinshall. On the power of curriculum learning in training deep networks. In Proceedings of the 36th International Conference on Machine Learning (PMLR), volume 97, pages 2535–2544, 2019.
  23. A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2021.
  24. Defending smart electrical power grids against cyberattacks with deep q-learning. P R X Energy, 1:033005, 2022.
  25. Preferential cyber defense for power grids. P R X Energy, 2:043007, Oct 2023.
  26. J. Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers. In From Animals to Animats, 1991.
  27. Curiosity-driven exploration for off-policy reinforcement learning methods. In 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), 2019.
  28. Curiosity-driven reinforcement learning for dialogue management. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.
  29. N. Bougie and R. Ichise. Fast and slow curiosity for high-level exploration in reinforcement learning. Applied Intelligence, 51(2):1086–1107, 2020.
  30. Random curiosity-driven exploration in deep reinforcement learning. Neurocomputing, 418:139–147, 2020.
  31. Simulation of homogeneous fish schools in the presence of food and predators using reinforcement learning. In 2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), 2022.
  32. Dialogue environments are different from games: Investigating variants of deep q-networks for dialogue policy. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 1070–1076, 2019.
  33. Curiosity did not kill the robot: A curiosity-based learning system for a shopkeeper robot. ACM Transactions on Human-Robot Interaction (THRI), 8(3):1–24, 2019.
  34. M. Sachan and E. P. Xing. Easy questions first? a case study on curriculum learning for question answering. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, volume 1, 2016.
  35. What makes a good conversation? how controllable attributes affect human judgments. In Proceedings of the 2019 Conference of the North, 2019.
  36. Learning from easy to complex: Adaptive multi-curricula learning for neural dialogue generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7472–7479, 2020.
  37. Microsoft dialogue challenge: Building end-to-end task-completion dialogue systems. arXiv preprint arXiv:1807.11125, 2018.
  38. A user simulator for task-completion dialogues. arXiv preprint arXiv:1612.05688, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Xuecheng Niu (1 paper)
  2. Akinori Ito (3 papers)
  3. Takashi Nose (4 papers)
Citations (1)