Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

"What's important here?": Opportunities and Challenges of Using LLMs in Retrieving Information from Web Interfaces (2312.06147v1)

Published 11 Dec 2023 in cs.CL and cs.IR
"What's important here?": Opportunities and Challenges of Using LLMs in Retrieving Information from Web Interfaces

Abstract: LLMs that have been trained on a corpus that includes large amount of code exhibit a remarkable ability to understand HTML code. As web interfaces are primarily constructed using HTML, we design an in-depth study to see how LLMs can be used to retrieve and locate important elements for a user given query (i.e. task description) in a web interface. In contrast with prior works, which primarily focused on autonomous web navigation, we decompose the problem as an even atomic operation - Can LLMs identify the important information in the web page for a user given query? This decomposition enables us to scrutinize the current capabilities of LLMs and uncover the opportunities and challenges they present. Our empirical experiments show that while LLMs exhibit a reasonable level of performance in retrieving important UI elements, there is still a substantial room for improvement. We hope our investigation will inspire follow-up works in overcoming the current challenges in this domain.

Introduction to LLMs and Web Interfaces

LLMs have shown a remarkable ability to comprehend a variety of data formats, including HTML. Given that web interfaces are built using HTML, it is crucial to understand how LLMs can be utilized to retrieve and interact with important elements on a web page in response to a user query. This understanding can potentially lead to more effective and efficient information retrieval from web interfaces, significantly improving user experience and productivity.

Experiment Design

The paper examines the capability of LLMs, using Claude2 by Anthropic, which stands out with its 100k token context length, in extracting relevant web elements based on user queries. It explores four critical aspects that influence this process:

  1. The impact of example selection in few-shot prompting,
  2. The specificity of user queries,
  3. Strategies for truncating HTML documents, and
  4. The persona adopted by the LLM during interaction.

Findings and Challenges

The findings suggest that while LLMs show reasonable effectiveness in retrieving web UI elements, improvements are necessary. A notable discovery is that the method of example selection in prompting can significantly impact LLM performance. Semantically similar few-shot examples tend to improve recall rates, but too many examples can hamper performance due to longer input sequences. Simplifying or abstracting the specificity of a query to more closely mimic actual user behavior did not consistently affect outcomes, whereas intelligently truncating HTML content led to substantial performance gains.

Moreover, the role assumed by the LLM (e.g., a Web Assistant, Generic User, or UI Designer) also influenced outcomes, with the Web Assistant persona demonstrating superior performance. However, LLMs occasionally failed to follow directions or created references to nonexistent web elements, indicating areas where these models need refinement.

Conclusion and Outlook

The paper concludes with implications for future research, emphasizing the importance of further exploring LLM responsiveness to user intent, regardless of the level of prompt specificity. Strategies to encode extensive HTML content within the limited context length of LLMs are considered vital for extending such capabilities. Researchers should not only focus on model enhancements but also on privacy and security concerns when integrating personal user data.

The advancement in this field promises the development of more reliable and intelligent systems capable of assisting users in navigating the increasingly complex digital world efficiently.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Understanding html with large language models, 2023.
  2. Enabling conversational interaction with mobile ui using large language models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY, USA, 2023. Association for Computing Machinery.
  3. Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv:2307.13854, 2023.
  4. Mind2web: Towards a generalist agent for the web. arXiv preprint arXiv:2306.06070, 2023.
  5. World of bits: An open-domain platform for web-based agents. In International Conference on Machine Learning, pages 3135–3144. PMLR, 2017.
  6. Empowering llm to use smartphone for intelligent task automation, 2023.
  7. You only look at screens: Multimodal chain-of-action agents, 2023.
  8. Webshop: Towards scalable real-world web interaction with grounded language agents, 2023.
  9. Imagebind-llm: Multi-modality instruction tuning, 2023.
  10. Visual instruction tuning, 2023.
  11. Llm-planner: Few-shot grounded planning for embodied agents with large language models, 2023.
  12. Medalign: A clinician-generated dataset for instruction following with electronic medical records, 2023.
  13. Pandalm: An automatic evaluation benchmark for llm instruction tuning optimization, 2023.
  14. From quantity to quality: Boosting llm performance with self-guided data selection for instruction tuning, 2023.
  15. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021.
  16. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2998–3009, October 2023.
  17. Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action, 2022.
  18. Jarvis: A neuro-symbolic commonsense reasoning framework for conversational embodied agents, 2022.
  19. Esc: Exploration with soft commonsense constraints for zero-shot object navigation. arXiv preprint arXiv:2301.13166, 2023.
  20. Boosting in-context learning with factual knowledge, 2023.
  21. Screws: A modular framework for reasoning with revisions, 2023.
  22. Promptbreeder: Self-referential self-improvement via prompt evolution, 2023.
  23. Chain-of-verification reduces hallucination in large language models, 2023.
  24. Cognitive architectures for language agents, 2023.
  25. Anthropic. Model card and evaluations for claude models, Jul 2023.
  26. Learning from positive and unlabeled data: A survey. Machine Learning, 109(4):719–760, 2020.
  27. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  28. Prompting PaLM for translation: Assessing strategies and performance. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15406–15427, Toronto, Canada, July 2023. Association for Computational Linguistics.
  29. MiniLMv2: Multi-head self-attention relation distillation for compressing pretrained transformers. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2140–2151, Online, August 2021. Association for Computational Linguistics.
  30. Lost in the middle: How language models use long contexts, 2023.
  31. Why johnny can’t prompt: How non-ai experts try (and fail) to design llm prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY, USA, 2023. Association for Computing Machinery.
  32. Stylette: Styling the web with natural language. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, CHI ’22, New York, NY, USA, 2022. Association for Computing Machinery.
  33. Lisandra Maioli. Fixing bad ux designs: Master proven approaches, tools, and techniques to make your user experience great again. Packt Publishing Ltd., 2018.
  34. Marked personas: Using natural language prompts to measure stereotypes in language models, 2023.
  35. OpenAI. Gpt-4 technical report, 2023.
  36. Llama 2: Open foundation and fine-tuned chat models, 2023.
  37. Enhancing chat language models by scaling high-quality instructional conversations, 2023.
  38. Palm 2 technical report, 2023.
  39. Deberta: Decoding-enhanced bert with disentangled attention, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Faria Huq (8 papers)
  2. Jeffrey P. Bigham (48 papers)
  3. Nikolas Martelaro (24 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com