Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

USimAgent: Large Language Models for Simulating Search Users (2403.09142v2)

Published 14 Mar 2024 in cs.IR and cs.AI

Abstract: Due to the advantages in the cost-efficiency and reproducibility, user simulation has become a promising solution to the user-centric evaluation of information retrieval systems. Nonetheless, accurately simulating user search behaviors has long been a challenge, because users' actions in search are highly complex and driven by intricate cognitive processes such as learning, reasoning, and planning. Recently, LLMs have demonstrated remarked potential in simulating human-level intelligence and have been used in building autonomous agents for various tasks. However, the potential of using LLMs in simulating search behaviors has not yet been fully explored. In this paper, we introduce a LLM-based user search behavior simulator, USimAgent. The proposed simulator can simulate users' querying, clicking, and stopping behaviors during search, and thus, is capable of generating complete search sessions for specific search tasks. Empirical investigation on a real user behavior dataset shows that the proposed simulator outperforms existing methods in query generation and is comparable to traditional methods in predicting user clicks and stopping behaviors. These results not only validate the effectiveness of using LLMs for user simulation but also shed light on the development of a more robust and generic user simulators. The code and data are accessible at https://github.com/Meow-E/USimAgent.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Leif Azzopardi. 2009. Query side evaluation: an empirical analysis of effectiveness and effort. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, Boston, MA, USA, July 19-23, 2009, James Allan, Javed A. Aslam, Mark Sanderson, ChengXiang Zhai, and Justin Zobel (Eds.). 556–563.
  2. Building simulated queries for known-item topics: an analysis using six european languages. In SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, July 23-27, 2007, Wessel Kraaij, Arjen P. de Vries, Charles L. A. Clarke, Norbert Fuhr, and Noriko Kando (Eds.). 455–462.
  3. Krisztian Balog and ChengXiang Zhai. 2023. User simulation for evaluating information access systems. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region. 302–305.
  4. Time drives interaction: simulating sessions in diverse searching environments. In The 35th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR ’12, Portland, OR, USA, August 12-16, 2012, William R. Hersh, Jamie Callan, Yoelle Maarek, and Mark Sanderson (Eds.). 105–114.
  5. A Neural Click Model for Web Search. In Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11 - 15, 2016, Jacqueline Bourdeau, Jim Hendler, Roger Nkambou, Ian Horrocks, and Ben Y. Zhao (Eds.). 531–541.
  6. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.).
  7. Dynamic Test Collections for Retrieval Evaluation. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval, ICTIR 2015, Northampton, Massachusetts, USA, September 27-30, 2015, James Allan, W. Bruce Croft, Arjen P. de Vries, and Chengxiang Zhai (Eds.). 91–100.
  8. Olivier Chapelle and Ya Zhang. 2009. A dynamic bayesian network click model for web search ranking. In Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20-24, 2009, Juan Quemada, Gonzalo León, Yoëlle S. Maarek, and Wolfgang Nejdl (Eds.). 1–10.
  9. PaLM: Scaling Language Modeling with Pathways. CoRR abs/2204.02311 (2022). arXiv:2204.02311
  10. William S. Cooper. 1973. On selecting a measure of retrieval effectiveness part II. Implementation of the philosophy. J. Am. Soc. Inf. Sci. 24, 6 (1973), 413–424.
  11. An experimental comparison of click position-bias models. In Proceedings of the International Conference on Web Search and Web Data Mining, WSDM 2008, Palo Alto, California, USA, February 11-12, 2008, Marc Najork, Andrei Z. Broder, and Soumen Chakrabarti (Eds.). 87–94.
  12. Georges Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, Singapore, July 20-24, 2008, Sung-Hyon Myaeng, Douglas W. Oard, Fabrizio Sebastiani, Tat-Seng Chua, and Mun-Kew Leong (Eds.). 331–338.
  13. Context-Driven Interactive Query Simulations Based on Generative Large Language Models. CoRR abs/2312.09631 (2023). arXiv:2312.09631
  14. MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge. CoRR abs/2206.08853 (2022).
  15. CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language Models. arXiv preprint arXiv:2402.06360 (2024).
  16. Efficient multiple-click models in web search. In Proceedings of the Second International Conference on Web Search and Web Data Mining, WSDM 2009, Barcelona, Spain, February 9-11, 2009, Ricardo Baeza-Yates, Paolo Boldi, Berthier A. Ribeiro-Neto, and Berkant Barla Cambazoglu (Eds.). 124–131.
  17. Using controlled query generation to evaluate blind relevance feedback algorithms. In ACM/IEEE Joint Conference on Digital Libraries, JCDL 2006, Chapel Hill, NC, USA, June 11-15, 2006, Proceedings, Gary Marchionini, Michael L. Nelson, and Catherine C. Marshall (Eds.). 286–295.
  18. Large Language Models are Zero-Shot Reasoners. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.).
  19. Donald H. Kraft and T. Lee. 1979. Stopping rules and their effect on expected search length. Inf. Process. Manag. 15, 1 (1979), 47–58.
  20. SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks. CoRR abs/2305.17390 (2023). arXiv:2305.17390
  21. Investigating Cognitive Effects in Session-level Search User Satisfaction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019, Ankur Teredesai, Vipin Kumar, Ying Li, Rómer Rosales, Evimaria Terzi, and George Karypis (Eds.). 923–931.
  22. Generative Relevance Feedback with Large Language Models. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23-27, 2023, Hsin-Hsi Chen, Wei-Jou (Edward) Duh, Hen-Hsen Huang, Makoto P. Kato, Josiane Mothe, and Barbara Poblete (Eds.). 2026–2031.
  23. Self-Refine: Iterative Refinement with Self-Feedback. CoRR abs/2303.17651 (2023). arXiv:2303.17651
  24. David Maxwell. 2019. Modelling search and stopping in interactive information retrieval. SIGIR Forum 53, 1 (2019), 40–41.
  25. Searching and Stopping: An Analysis of Stopping Rules and Strategies. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management, CIKM 2015, Melbourne, VIC, Australia, October 19 - 23, 2015, James Bailey, Alistair Moffat, Charu C. Aggarwal, Maarten de Rijke, Ravi Kumar, Vanessa Murdock, Timos K. Sellis, and Jeffrey Xu Yu (Eds.). 313–322.
  26. Judgment-based and reasoning-based stopping rules in decision making under uncertainty. (01 1995).
  27. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.).
  28. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA, USA. 311–318.
  29. Toolformer: Language Models Can Teach Themselves to Use Tools. CoRR abs/2302.04761 (2023).
  30. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace. CoRR abs/2303.17580 (2023). arXiv:2303.17580
  31. Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv:2303.11366
  32. ViperGPT: Visual Inference via Python Execution for Reasoning. CoRR abs/2303.08128 (2023). arXiv:2303.08128
  33. Query2doc: Query Expansion with Large Language Models. CoRR abs/2303.07678 (2023). arXiv:2303.07678
  34. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (Marina Del Rey, CA, USA) (WSDM ’18). Association for Computing Machinery, New York, NY, USA, 610–618.
  35. Generative Query Reformulation for Effective Adhoc Search. CoRR abs/2308.00415 (2023). arXiv:2308.00415
  36. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.).
  37. WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.).
  38. ReAct: Synergizing Reasoning and Acting in Language Models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023.
  39. Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models. CoRR abs/2310.04406 (2023). https://doi.org/10.48550/ARXIV.2310.04406 arXiv:2310.04406
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Erhan Zhang (5 papers)
  2. Xingzhu Wang (3 papers)
  3. Peiyuan Gong (5 papers)
  4. Yankai Lin (125 papers)
  5. Jiaxin Mao (47 papers)
Citations (6)

Summary

  • The paper introduces USimAgent, a unified LLM framework that simulates full user search sessions by modeling queries, clicks, and stopping behaviors.
  • The methodology employs context-driven reasoning to integrate past interactions, enabling realistic simulation of complex search behaviors.
  • Empirical results show that USimAgent outperforms traditional methods in query generation and achieves competitive accuracy in click and stop predictions.

Introducing USimAgent: Advancing User Simulation with LLMs

Overview of USimAgent

User simulation, while a promising method for evaluating information retrieval systems, has consistently faced the challenge of replicating complex user search behaviors authentically. This paper introduces USimAgent, a LLM-based framework designed to simulate user search behaviors such as querying, clicking, and stopping. By incorporating the recent advancements in LLMs, USimAgent aims to achieve a more accurate and comprehensive simulation of a full user search session, thereby addressing the limitations of existing simulation methodologies.

Technical Foundations and Innovations

The paper's methodology section outlines the foundational aspects of user simulation, highlighting the complex and dynamic nature of user search behavior which includes generating queries, interacting with search engine results pages (SERP), and deciding when to conclude a session. Unlike previous approaches that often rely on disjointed models for simulating different aspects of search behavior, USimAgent employs a unified LLM to simulate the entire search session dynamically. This approach takes advantage of LLMs’ capabilities in natural language understanding, zero-shot/few-shot learning, multi-tasking, and coherent action planning.

A key innovation in USimAgent is its use of context-driven reasoning to simulate the cognitive processes underpinning users' search activities. By integrating context from previous interactions within a session, USimAgent can generate more realistic queries, predict clicks with comparable accuracy to traditional methods, and determine when to end a session. This is a significant step toward capturing the intricate cognitive processes that influence real-world user behavior in search contexts.

Empirical Validation

Experimental results presented in the paper underscore USimAgent’s superior performance in generating queries when compared against traditional simulation methods. By evaluating the model on a public user behavior dataset, the authors demonstrate that USimAgent not only outperforms existing query generation methods but also achieves comparable results in simulating clicking and stopping behaviors. This validation not only attests to the potential of using LLMs for user simulation but also indicates areas for further improvement, particularly in enhancing the model’s predictive capabilities for clicks and stopping decisions.

Implications and Future Directions

The development and validation of USimAgent broaden the horizons for research into information retrieval evaluation. It opens up new possibilities for utilizing LLMs in simulating user behaviors, potentially leading to more robust, efficient, and user-centric evaluation methodologies. This could have far-reaching implications for the design and improvement of information retrieval systems, making them more attuned to real-world user needs and behaviors.

However, the paper also identifies areas necessitating further research, notably in combining LLM-based simulation methods with more extensive datasets to improve predictive accuracy. Future developments could also explore the integration of advanced LLMs and the refinement of context and reasoning mechanisms within the simulation framework.

Conclusion

This paper's contribution, USimAgent, represents a notable advance in user simulation for information retrieval systems evaluation, leveraging the power of LLMs to simulate complex user search behaviors more effectively. While showcasing the potential of LLMs in this domain, the paper also lays a foundation for future research aimed at enhancing the fidelity and accuracy of user simulations, ultimately contributing to the development of more user-centric search technologies.