ChatShop: Interactive Information Seeking with Language Agents (2404.09911v2)
Abstract: The desire and ability to seek new information strategically are fundamental to human learning but often overlooked in current language agent evaluation. We analyze a popular web shopping task designed to test language agents' ability to perform strategic exploration and discover that it can be reformulated and solved as a single-turn retrieval task without the need for interactive information seeking. This finding encourages us to rethink realistic constraints on information access that would necessitate strategic information seeking. We then redesign the task to introduce a notion of task ambiguity and the role of a shopper, serving as a dynamic party with whom the agent strategically interacts in an open-ended conversation to make informed decisions. Our experiments demonstrate that the proposed task can effectively evaluate the agent's ability to explore and gradually accumulate information through multi-turn interactions. Additionally, we show that LLM-simulated shoppers serve as a good proxy for real human shoppers, revealing similar error patterns in agents.
- Towards information-seeking agents. arXiv preprint arXiv:1612.02605.
- Towards end-to-end reinforcement learning of dialogue agents for information access. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 484–495, Vancouver, Canada. Association for Computational Linguistics.
- The curious case of neural text degeneration. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
- Eliciting human preferences with language models. arXiv preprint arXiv: 2310.11589.
- Metaagents: Simulating interactions of human behaviors for llm-based task-oriented coordination via collaborative generative agents. arXiv preprint arXiv:2310.06500.
- Decision-oriented dialogue for human-ai collaboration. arXiv preprint arXiv: 2305.20076.
- Lost in the middle: How language models use long contexts. arXiv preprint arXiv: 2307.03172.
- Agentbench: Evaluating llms as agents. arXiv preprint arXiv: 2308.03688.
- Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1–22.
- Task ambiguity in humans and language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
- Mint: Evaluating llms in multi-turn interaction with tools and language feedback. arXiv preprint arXiv: 2309.10691.
- Openagents: An open platform for language agents in the wild. arXiv preprint arXiv: 2310.10634.
- Webshop: Towards scalable real-world web interaction with grounded language agents. In Advances in Neural Information Processing Systems.
- React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
- Interactive machine comprehension with information seeking agents. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2325–2338, Online. Association for Computational Linguistics.
- Conversational information seeking. arXiv preprint arXiv:2201.08808.
- Judging llm-as-a-judge with mt-bench and chatbot arena.
- Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv: 2307.13854.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.