Papers
Topics
Authors
Recent
Search
2000 character limit reached

Large Language Models are Not Stable Recommender Systems

Published 25 Dec 2023 in cs.IR and cs.AI | (2312.15746v1)

Abstract: With the significant successes of LLMs in many natural language processing tasks, there is growing interest among researchers in exploring LLMs for novel recommender systems. However, we have observed that directly using LLMs as a recommender system is usually unstable due to its inherent position bias. To this end, we introduce exploratory research and find consistent patterns of positional bias in LLMs that influence the performance of recommendation across a range of scenarios. Then, we propose a Bayesian probabilistic framework, STELLA (Stable LLM for Recommendation), which involves a two-stage pipeline. During the first probing stage, we identify patterns in a transition matrix using a probing detection dataset. And in the second recommendation stage, a Bayesian strategy is employed to adjust the biased output of LLMs with an entropy indicator. Therefore, our framework can capitalize on existing pattern information to calibrate instability of LLMs, and enhance recommendation performance. Finally, extensive experiments clearly validate the effectiveness of our framework.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Bowman, S. R. 2023. Eight things to know about large language models. arXiv preprint arXiv:2304.00612.
  2. Language Models are Few-Shot Learners. In NeurIPS.
  3. Deep CTR Prediction in Display Advertising. In Proceedings of the 24th ACM International Conference on Multimedia, MM ’16, 811–820. New York, NY, USA: Association for Computing Machinery. ISBN 9781450336031.
  4. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  5. Uncovering ChatGPT’s Capabilities in Recommender Systems. arXiv:2305.02182.
  6. A Survey for In-context Learning. arXiv preprint arXiv:2301.00234.
  7. Emerson, P. 2013. The original Borda count and partial voting. Soc. Choice Welf., 40(2): 353–358.
  8. Ferrara, E. 2023. Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models. CoRR, abs/2304.03738.
  9. Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System. arXiv:2303.14524.
  10. A Survey on Knowledge Graph-Based Recommender Systems. IEEE Trans. Knowl. Data Eng., 34(8): 3549–3568.
  11. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst., 5(4): 19:1–19:19.
  12. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In WWW, 507–517. ACM.
  13. Neural Collaborative Filtering. In WWW, 173–182. ACM.
  14. Session-based Recommendations with Recurrent Neural Networks. In ICLR (Poster).
  15. Towards Universal Sequence Representation Learning for Recommender Systems. In KDD, 585–593. ACM.
  16. Large Language Models are Zero-Shot Rankers for Recommender Systems. arXiv:2305.08845.
  17. Self-Attentive Sequential Recommendation. In ICDM, 197–206. IEEE Computer Society.
  18. Large Language Models are Zero-Shot Reasoners. In NeurIPS.
  19. Sparks of Artificial General Recommender (AGR): Early Experiments with ChatGPT. arXiv:2305.04518.
  20. Is ChatGPT a Good Recommender? A Preliminary Study. arXiv:2304.10149.
  21. Lost in the Middle: How Language Models Use Long Contexts. arXiv:2307.03172.
  22. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. In ACL (1), 8086–8098. Association for Computational Linguistics.
  23. SimpleX: A Simple and Strong Baseline for Collaborative Filtering. In CIKM, 1243–1252. ACM.
  24. OpenAI. 2022. Introducing ChatGPT. CoRR.
  25. “It is just a flu”: Assessing the Effect of Watch History on YouTube’s Pseudoscientific Video Recommendations. In Proceedings of the international AAAI conference on web and social media, volume 16, 723–734.
  26. Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting. arXiv:2306.17563.
  27. Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting. CoRR, abs/2305.04388.
  28. Zero-Shot Next-Item Recommendation using Large Pretrained Language Models. arXiv:2304.03153.
  29. Large Language Models are not Fair Evaluators. arXiv:2305.17926.
  30. Generative Recommendation: Towards Next-generation Recommender Paradigm. arXiv:2304.03516.
  31. Feedrec: News feed recommendation with various user feedbacks. In Proceedings of the ACM Web Conference 2022, 2088–2097.
  32. MIND: A Large-scale Dataset for News Recommendation. In ACL, 3597–3606. Association for Computational Linguistics.
  33. A Survey on Large Language Models for Recommendation. arXiv:2305.19860.
  34. Dually Enhanced Propensity Score Estimation in Sequential Recommendation. In CIKM, 2260–2269. ACM.
  35. Language Models as Recommender Systems: Evaluations and Limitations. In I (Still) Can’t Believe It’s Not Better! NeurIPS 2021 Workshop.
  36. Calibrate Before Use: Improving Few-shot Performance of Language Models. In ICML, volume 139 of Proceedings of Machine Learning Research, 12697–12706. PMLR.
Citations (6)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.