Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning (2405.00516v1)
Abstract: Recent advancements in LLMs have demonstrated remarkable improvements in various NLP tasks such as web navigation. Supervised learning (SL) approaches have achieved impressive performance while utilizing significantly less training data compared to previous methods. However, these SL-based models fall short when compared to reinforcement learning (RL) approaches, which have shown superior results. In this paper, we propose a novel approach that combines SL and RL techniques over the MiniWoB benchmark to leverage the strengths of both methods. We also address a critical limitation in previous models' understanding of HTML content, revealing a tendency to memorize target elements rather than comprehend the underlying structure. To rectify this, we propose methods to enhance true understanding and present a new baseline of results. Our experiments demonstrate that our approach outperforms previous SL methods on certain tasks using less data and narrows the performance gap with RL models, achieving 43.58\% average accuracy in SL and 36.69\% when combined with a multimodal RL approach. This study sets a new direction for future web navigation and offers insights into the limitations and potential of LLMing for computer tasks.
- An Optimistic Perspective on Offline Reinforcement Learning. arXiv:1907.04543 [cs.LG]
- HTLM: Hyper-Text Pre-Training and Prompting of Language Models. arXiv:2107.06955 [cs.CL]
- A Closer Look at Memorization in Deep Networks. arXiv:1706.05394 [stat.ML]
- Aladdin Ayesh. 2019. Turing Test Revisited: A Framework for an Alternative. arXiv:1906.11068 [cs.AI]
- Jean Baudrillard. 1981. Simulacres et Simulations. Galilée.
- Reinforcement Learning for Mapping Instructions to Actions. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, Suntec, Singapore, 82–90. https://aclanthology.org/P09-1010
- OpenAI Gym. arXiv:1606.01540 [cs.LG]
- Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]
- Quantifying Memorization Across Neural Language Models. arXiv:2202.07646 [cs.LG]
- Deep Reinforcement Learning from Human Preferences. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/d5e2c0adad503c91f91df240d0cd4e49-Paper.pdf
- European Parliament and Council of the European Union. [n. d.]. Regulation (EU) 2016/679 of the European Parliament and of the Council. https://data.europa.eu/eli/reg/2016/679/oj
- Understanding HTML with Large Language Models. arXiv:2210.03945 [cs.LG]
- ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces. arXiv:2012.12350 [cs.CL]
- Training Compute-Optimal Large Language Models. arXiv:2203.15556 [cs.CL]
- A data-driven approach for learning to control computers. arXiv:2202.08137 [cs.LG]
- Language Models can Solve Computer Tasks. arXiv:2303.17491 [cs.CL]
- Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs.LG]
- Ian LeCun. 2023. Do large language models need sensory grounding for meaning and understanding? https://drive.google.com/file/d/1BU5bV3X5w65DwSMapKcsr0ZvrMRU_Nbi/view
- Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81. https://www.aclweb.org/anthology/W04-1013
- Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration. In International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1802.08802
- WebGPT: Browser-assisted question-answering with human feedback. arXiv:2112.09332 [cs.CL]
- OpenAI. 2016. Universe. https://github.com/openai/universe
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Mapping natural language commands to web elements. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 4970–4976. https://doi.org/10.18653/v1/D18-1540
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv:1910.10683 [cs.LG]
- Natalie Schluter. 2017. The limits of automatic summarisation according to ROUGE. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, Valencia, Spain, 41–45. https://aclanthology.org/E17-2007
- Online and Offline Reinforcement Learning by Planning with a Learned Model. arXiv:2104.06294 [cs.LG]
- Proximal Policy Optimization Algorithms. arXiv:1707.06347 [cs.LG]
- World of Bits: An Open-Domain Platform for Web-Based Agents. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 3135–3144. https://proceedings.mlr.press/v70/shi17a.html
- V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control. arXiv:1909.12238 [cs.AI]
- Adam Vogel and Daniel Jurafsky. 2010. Learning to Follow Navigational Directions. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Uppsala, Sweden, 806–814. https://aclanthology.org/P10-1083
- WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents. arXiv:2207.01206 [cs.CL]
- Fine-Tuning Language Models from Human Preferences. arXiv:1909.08593 [cs.CL]