Behavior Alignment: A New Perspective of Evaluating LLM-based Conversational Recommender Systems (2404.11773v2)
Abstract: LLMs have demonstrated great potential in Conversational Recommender Systems (CRS). However, the application of LLMs to CRS has exposed a notable discrepancy in behavior between LLM-based CRS and human recommenders: LLMs often appear inflexible and passive, frequently rushing to complete the recommendation task without sufficient inquiry.This behavior discrepancy can lead to decreased accuracy in recommendations and lower user satisfaction. Despite its importance, existing studies in CRS lack a study about how to measure such behavior discrepancy. To fill this gap, we propose Behavior Alignment, a new evaluation metric to measure how well the recommendation strategies made by a LLM-based CRS are consistent with human recommenders'. Our experiment results show that the new metric is better aligned with human preferences and can better differentiate how systems perform than existing evaluation metrics. As Behavior Alignment requires explicit and costly human annotations on the recommendation strategies, we also propose a classification-based method to implicitly measure the Behavior Alignment based on the responses. The evaluation results confirm the robustness of the method.
- “Tallrec: An effective and efficient tuning framework to align large language model with recommendation” In arXiv preprint arXiv:2305.00447, 2023
- “Towards knowledge-based recommender dialog system” In arXiv preprint arXiv:1908.05391, 2019
- “Survey on evaluation methods for dialogue systems” In Artificial Intelligence Review 54 Springer, 2021, pp. 755–810
- “Leveraging Large Language Models in Conversational Recommender Systems” In arXiv preprint arXiv:2305.07961, 2023
- “Advances and challenges in conversational recommender systems: A survey” In AI open 2 Elsevier, 2021, pp. 100–126
- “Inspired: Toward sociable recommendation dialog systems” In arXiv preprint arXiv:2009.14306, 2020
- “Large language models as zero-shot conversational recommenders” In arXiv preprint arXiv:2308.10053, 2023
- “Large language models are zero-shot rankers for recommender systems” In arXiv preprint arXiv:2305.08845, 2023
- Dietmar Jannach “Evaluating conversational recommender systems: A landscape of research” In Artificial Intelligence Review 56.3 Springer, 2023, pp. 2365–2400
- “OpenAssistant Conversations–Democratizing Large Language Model Alignment” In arXiv preprint arXiv:2304.07327, 2023
- “Large language models for generative recommendation: A survey and visionary discussions” In arXiv preprint arXiv:2309.01157, 2023
- “Towards deep conversational recommendations” In Advances in neural information processing systems 31, 2018
- Lizi Liao, Grace Hui Yang and Chirag Shah “Proactive Conversational Agents in the Post-ChatGPT World” In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023, pp. 3452–3455
- “Training language models to follow instructions with human feedback” In Advances in Neural Information Processing Systems 35, 2022, pp. 27730–27744
- “The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only” In arXiv preprint arXiv:2306.01116, 2023
- Damien Sileo, Wout Vossen and Robbe Raymaekers “Zero-shot recommendation as language modeling” In European Conference on Information Retrieval, 2022, pp. 223–230 Springer
- “Conversational recommender system” In The 41st international acm sigir conference on research & development in information retrieval, 2018, pp. 235–244
- “Llama 2: Open foundation and fine-tuned chat models” In arXiv preprint arXiv:2307.09288, 2023
- “Towards unified conversational recommender systems via knowledge-enhanced prompt learning” In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 1929–1937
- “Emergent abilities of large language models” In arXiv preprint arXiv:2206.07682, 2022
- Gangyi Zhang “User-Centric Conversational Recommendation: Adapting the Need of User with Large Language Models” In Proceedings of the 17th ACM Conference on Recommender Systems, 2023, pp. 1349–1354
- “Recommendation as instruction following: A large language model empowered recommendation approach” In arXiv preprint arXiv:2305.07001, 2023
- “Evaluating conversational recommender systems via user simulation” In Proceedings of the 26th acm sigkdd international conference on knowledge discovery & data mining, 2020, pp. 1512–1520
- “CRSLab: An open-source toolkit for building conversational recommender system” In arXiv preprint arXiv:2101.00939, 2021
- “Improving conversational recommender systems via knowledge graph based semantic fusion” In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, pp. 1006–1014
- “Towards topic-guided conversational recommender system” In arXiv preprint arXiv:2010.04125, 2020
- Dayu Yang (8 papers)
- Fumian Chen (5 papers)
- Hui Fang (48 papers)