Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Effects of Fine-tuning Language Models for Text-Based Reinforcement Learning (2404.10174v1)

Published 15 Apr 2024 in cs.CL

Abstract: Text-based reinforcement learning involves an agent interacting with a fictional environment using observed text and admissible actions in natural language to complete a task. Previous works have shown that agents can succeed in text-based interactive environments even in the complete absence of semantic understanding or other linguistic capabilities. The success of these agents in playing such games suggests that semantic understanding may not be important for the task. This raises an important question about the benefits of LMs in guiding the agents through the game states. In this work, we show that rich semantic understanding leads to efficient training of text-based RL agents. Moreover, we describe the occurrence of semantic degeneration as a consequence of inappropriate fine-tuning of LLMs in text-based reinforcement learning (TBRL). Specifically, we describe the shift in the semantic representation of words in the LM, as well as how it affects the performance of the agent in tasks that are semantically similar to the training games. We believe these results may help develop better strategies to fine-tune agents in text-based RL scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Prithviraj Ammanabrolu and Matthew Hausknecht. 2020. Graph constrained reinforcement learning for natural language action spaces. arXiv preprint arXiv:2001.08837.
  2. Case-based reasoning for better generalization in textual reinforcement learning. In International Conference on Learning Representations.
  3. Textworld: A learning environment for text-based games. In Workshop on Computer Games, pages 41–75. Springer.
  4. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  5. Rahul Dey and Fathi M Salem. 2017. Gate-variants of gated recurrent unit (gru) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pages 1597–1600. IEEE.
  6. Christiane Fellbaum. 2010. Wordnet. In Theory and applications of ontology: computer applications, pages 231–243. Springer.
  7. Interactive fiction games: A colossal adventure. CoRR, abs/1909.05398.
  8. Deep reinforcement learning with a natural language action space. arXiv preprint arXiv:1511.04636.
  9. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.
  10. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
  11. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  12. On the stability of fine-tuning bert: Misconceptions, explanations, and strong baselines. arXiv preprint arXiv:2006.04884.
  13. Text-based rl agents with commonsense knowledge: New challenges, environments and baselines.
  14. Efficient text-based reinforcement learning by jointly leveraging state and commonsense graph representations. In Acl-Ijcnlp 2021: The 59Th Annual Meeting Of The Association For Computational Linguistics And The 11Th International Joint Conference On Natural Language Processing, Vol 2, pages 719–725. ASSOC COMPUTATIONAL LINGUISTICS-ACL.
  15. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
  16. Multi-stage episodic control for strategic exploration in text games. In International Conference on Learning Representations.
  17. Behavior cloned transformers are neurosymbolic reasoners. arXiv preprint arXiv:2210.07382.
  18. Scienceworld: Is your agent smarter than a 5th grader? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11279–11298.
  19. Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224.
  20. Reading and acting while blindfolded: The need for semantics in text game agents. arXiv preprint arXiv:2103.13552.
  21. Keep calm and explore: Language models for action generation in text-based games. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8736–8754.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Soham Dan (41 papers)
  2. Keerthiram Murugesan (38 papers)
  3. Subhajit Chaudhury (40 papers)
  4. Mauricio Gruppi (2 papers)

Summary

We haven't generated a summary for this paper yet.