Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning (2405.00516v1)

Published 1 May 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Recent advancements in LLMs have demonstrated remarkable improvements in various NLP tasks such as web navigation. Supervised learning (SL) approaches have achieved impressive performance while utilizing significantly less training data compared to previous methods. However, these SL-based models fall short when compared to reinforcement learning (RL) approaches, which have shown superior results. In this paper, we propose a novel approach that combines SL and RL techniques over the MiniWoB benchmark to leverage the strengths of both methods. We also address a critical limitation in previous models' understanding of HTML content, revealing a tendency to memorize target elements rather than comprehend the underlying structure. To rectify this, we propose methods to enhance true understanding and present a new baseline of results. Our experiments demonstrate that our approach outperforms previous SL methods on certain tasks using less data and narrows the performance gap with RL models, achieving 43.58\% average accuracy in SL and 36.69\% when combined with a multimodal RL approach. This study sets a new direction for future web navigation and offers insights into the limitations and potential of LLMing for computer tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. An Optimistic Perspective on Offline Reinforcement Learning. arXiv:1907.04543 [cs.LG]
  2. HTLM: Hyper-Text Pre-Training and Prompting of Language Models. arXiv:2107.06955 [cs.CL]
  3. A Closer Look at Memorization in Deep Networks. arXiv:1706.05394 [stat.ML]
  4. Aladdin Ayesh. 2019. Turing Test Revisited: A Framework for an Alternative. arXiv:1906.11068 [cs.AI]
  5. Jean Baudrillard. 1981. Simulacres et Simulations. Galilée.
  6. Reinforcement Learning for Mapping Instructions to Actions. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, Suntec, Singapore, 82–90. https://aclanthology.org/P09-1010
  7. OpenAI Gym. arXiv:1606.01540 [cs.LG]
  8. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]
  9. Quantifying Memorization Across Neural Language Models. arXiv:2202.07646 [cs.LG]
  10. Deep Reinforcement Learning from Human Preferences. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/d5e2c0adad503c91f91df240d0cd4e49-Paper.pdf
  11. European Parliament and Council of the European Union. [n. d.]. Regulation (EU) 2016/679 of the European Parliament and of the Council. https://data.europa.eu/eli/reg/2016/679/oj
  12. Understanding HTML with Large Language Models. arXiv:2210.03945 [cs.LG]
  13. ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces. arXiv:2012.12350 [cs.CL]
  14. Training Compute-Optimal Large Language Models. arXiv:2203.15556 [cs.CL]
  15. A data-driven approach for learning to control computers. arXiv:2202.08137 [cs.LG]
  16. Language Models can Solve Computer Tasks. arXiv:2303.17491 [cs.CL]
  17. Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs.LG]
  18. Ian LeCun. 2023. Do large language models need sensory grounding for meaning and understanding? https://drive.google.com/file/d/1BU5bV3X5w65DwSMapKcsr0ZvrMRU_Nbi/view
  19. Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81. https://www.aclweb.org/anthology/W04-1013
  20. Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration. In International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1802.08802
  21. WebGPT: Browser-assisted question-answering with human feedback. arXiv:2112.09332 [cs.CL]
  22. OpenAI. 2016. Universe. https://github.com/openai/universe
  23. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  24. Mapping natural language commands to web elements. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 4970–4976. https://doi.org/10.18653/v1/D18-1540
  25. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv:1910.10683 [cs.LG]
  26. Natalie Schluter. 2017. The limits of automatic summarisation according to ROUGE. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, Valencia, Spain, 41–45. https://aclanthology.org/E17-2007
  27. Online and Offline Reinforcement Learning by Planning with a Learned Model. arXiv:2104.06294 [cs.LG]
  28. Proximal Policy Optimization Algorithms. arXiv:1707.06347 [cs.LG]
  29. World of Bits: An Open-Domain Platform for Web-Based Agents. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 3135–3144. https://proceedings.mlr.press/v70/shi17a.html
  30. V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control. arXiv:1909.12238 [cs.AI]
  31. Adam Vogel and Daniel Jurafsky. 2010. Learning to Follow Navigational Directions. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Uppsala, Sweden, 806–814. https://aclanthology.org/P10-1083
  32. WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents. arXiv:2207.01206 [cs.CL]
  33. Fine-Tuning Language Models from Human Preferences. arXiv:1909.08593 [cs.CL]

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com