Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 84 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 96 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Kimi K2 189 tok/s Pro
2000 character limit reached

ChatShop: Interactive Information Seeking with Language Agents (2404.09911v2)

Published 15 Apr 2024 in cs.CL

Abstract: The desire and ability to seek new information strategically are fundamental to human learning but often overlooked in current language agent evaluation. We analyze a popular web shopping task designed to test language agents' ability to perform strategic exploration and discover that it can be reformulated and solved as a single-turn retrieval task without the need for interactive information seeking. This finding encourages us to rethink realistic constraints on information access that would necessitate strategic information seeking. We then redesign the task to introduce a notion of task ambiguity and the role of a shopper, serving as a dynamic party with whom the agent strategically interacts in an open-ended conversation to make informed decisions. Our experiments demonstrate that the proposed task can effectively evaluate the agent's ability to explore and gradually accumulate information through multi-turn interactions. Additionally, we show that LLM-simulated shoppers serve as a good proxy for real human shoppers, revealing similar error patterns in agents.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Towards information-seeking agents. arXiv preprint arXiv:1612.02605.
  2. Towards end-to-end reinforcement learning of dialogue agents for information access. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 484–495, Vancouver, Canada. Association for Computational Linguistics.
  3. The curious case of neural text degeneration. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
  4. Eliciting human preferences with language models. arXiv preprint arXiv: 2310.11589.
  5. Metaagents: Simulating interactions of human behaviors for llm-based task-oriented coordination via collaborative generative agents. arXiv preprint arXiv:2310.06500.
  6. Decision-oriented dialogue for human-ai collaboration. arXiv preprint arXiv: 2305.20076.
  7. Lost in the middle: How language models use long contexts. arXiv preprint arXiv: 2307.03172.
  8. Agentbench: Evaluating llms as agents. arXiv preprint arXiv: 2308.03688.
  9. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1–22.
  10. Task ambiguity in humans and language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  11. Mint: Evaluating llms in multi-turn interaction with tools and language feedback. arXiv preprint arXiv: 2309.10691.
  12. Openagents: An open platform for language agents in the wild. arXiv preprint arXiv: 2310.10634.
  13. Webshop: Towards scalable real-world web interaction with grounded language agents. In Advances in Neural Information Processing Systems.
  14. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  15. Interactive machine comprehension with information seeking agents. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2325–2338, Online. Association for Computational Linguistics.
  16. Conversational information seeking. arXiv preprint arXiv:2201.08808.
  17. Judging llm-as-a-judge with mt-bench and chatbot arena.
  18. Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv: 2307.13854.
Citations (2)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube