Learning to Play Text-based Adventure Games with Maximum Entropy Reinforcement Learning (2302.10720v2)
Abstract: Text-based games are a popular testbed for language-based reinforcement learning (RL). In previous work, deep Q-learning is commonly used as the learning agent. Q-learning algorithms are challenging to apply to complex real-world domains due to, for example, their instability in training. Therefore, in this paper, we adapt the soft-actor-critic (SAC) algorithm to the text-based environment. To deal with sparse extrinsic rewards from the environment, we combine it with a potential-based reward shaping technique to provide more informative (dense) reward signals to the RL agent. We apply our method to play difficult text-based games. The SAC method achieves higher scores than the Q-learning methods on many games with only half the number of training steps. This shows that it is well-suited for text-based games. Moreover, we show that the reward shaping technique helps the agent to learn the policy faster and achieve higher scores. In particular, we consider a dynamically learned value function as a potential function for shaping the learner's original sparse reward signals.
- Christodoulou, P.: Soft actor-critic for discrete action settings. arXiv preprint arXiv:1910.07207 (2019)
- Gu, Y., Yao, S., Gan, C., Tenenbaum, J., Yu, M.: Revisiting the roles of “text” in text games. In: Findings of the Association for Computational Linguistics: EMNLP 2022. pp. 6867–6876. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (2022)
- Weichen Li (7 papers)
- Rati Devidze (11 papers)
- Sophie Fellenz (21 papers)