Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CFaiRLLM: Consumer Fairness Evaluation in Large-Language Model Recommender System (2403.05668v2)

Published 8 Mar 2024 in cs.IR

Abstract: This work takes a critical stance on previous studies concerning fairness evaluation in LLM-based recommender systems, which have primarily assessed consumer fairness by comparing recommendation lists generated with and without sensitive user attributes. Such approaches implicitly treat discrepancies in recommended items as biases, overlooking whether these changes might stem from genuine personalization aligned with true preferences of users. Moreover, these earlier studies typically address single sensitive attributes in isolation, neglecting the complex interplay of intersectional identities. In response to these shortcomings, we introduce CFaiRLLM, an enhanced evaluation framework that not only incorporates true preference alignment but also rigorously examines intersectional fairness by considering overlapping sensitive attributes. Additionally, CFaiRLLM introduces diverse user profile sampling strategies-random, top-rated, and recency-focused-to better understand the impact of profile generation fed to LLMs in light of inherent token limitations in these systems. Given that fairness depends on accurately understanding users' tastes and preferences,, these strategies provide a more realistic assessment of fairness within RecLLMs. The results demonstrated that true preference alignment offers a more personalized and fair assessment compared to similarity-based measures, revealing significant disparities when sensitive and intersectional attributes are incorporated. Notably, our study finds that intersectional attributes amplify fairness gaps more prominently, especially in less structured domains such as music recommendations in LastFM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. A unifying and general account of fairness measurement in recommender systems. Information Processing & Management 60, 1 (2023), 103115.
  2. Attempt: Parameter-efficient multi-task tuning via attentional mixtures of soft prompts. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 6655–6672.
  3. Interactive Question Answering Systems: Literature Review. ACM Computing Surveys (CSUR) (2024).
  4. Toine Bogers and Marijn Koolen. 2017. Defining and supporting narrative-driven recommendation. In Proceedings of the eleventh ACM conference on recommender systems. 238–242.
  5. Interplay between upsampling and regularization for provider fairness in recommender systems. User Modeling and User-Adapted Interaction 31, 3 (2021), 421–455.
  6. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  7. Balanced neighborhoods for multi-sided fairness in recommendation. In Conference on fairness, accountability and transparency. PMLR, 202–214.
  8. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. Journal of Medical Systems 47, 1 (2023), 33.
  9. Fair sharing for sharing economy platforms. (2017).
  10. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology (2023).
  11. Exploiting personalized calibration and metrics for fairness recommendation. Expert Systems with Applications 181 (2021), 115112.
  12. Semantics-aware content-based recommender systems. Recommender systems handbook (2015), 119–159.
  13. Yashar Deldjoo. 2023. Fairness of ChatGPT and the Role of Explainable-Guided Prompts. In COLLM@ECML-PKDD’23.
  14. Yashar Deldjoo. 2024. Understanding Biases in ChatGPT-based Recommender Systems: Provider Fairness, Temporal Stability, and Recency. arXiv preprint arXiv:2401.10545 (2024).
  15. A flexible framework for evaluating user and item fairness in recommender systems. User Modeling and User-Adapted Interaction (2021), 1–55.
  16. A survey on adversarial recommender systems: from attack/defense strategies to generative adversarial networks. Comput. Surveys 2 (2022), 1–38.
  17. Fairness in recommender systems: research landscape and future directions. User Modeling and User-Adapted Interaction (2023), 1–50.
  18. Content-based multimedia recommendation systems: definition and application domains. In Italian Information Retrieval Workshop. 1–4.
  19. Evaluating chatgpt as a recommender system: A rigorous approach. arXiv preprint arXiv:2309.03613 (2023).
  20. Two-sided fairness in rankings via Lorenz dominance. Advances in Neural Information Processing Systems 34 (2021).
  21. User-item matching for recommendation fairness. IEEE Access 9 (2021), 130389–130398.
  22. Michael Dowling and Brian Lucey. 2023. ChatGPT for (finance) research: The Bananarama conjecture. Finance Research Letters 53 (2023), 103662.
  23. Fairness and discrimination in recommendation and retrieval. In Proceedings of the 13th ACM Conference on Recommender Systems. 576–577.
  24. A fairness-aware hybrid recommender system. arXiv preprint arXiv:1809.09030 (2018).
  25. Towards long-term fairness in recommendation. In Proceedings of the 14th ACM international conference on web search and data mining. 445–453.
  26. Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5). In Proceedings of the 16th ACM Conference on Recommender Systems. 299–315.
  27. The winner takes it all: geographic imbalance and provider (un) fairness in educational recommender systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1808–1812.
  28. Pareto optimality for fairness-constrained collaborative filtering. In Proceedings of the 29th ACM International Conference on Multimedia. 5619–5627.
  29. Large language models as zero-shot conversational recommenders. In Proceedings of the 32nd ACM international conference on information and knowledge management. 720–730.
  30. Towards universal sequence representation learning for recommender systems. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 585–593.
  31. Large language models are zero-shot rankers for recommender systems. arXiv preprint arXiv:2305.08845 (2023).
  32. Health-LLM: Personalized Retrieval-Augmented Disease Prediction Model. arXiv preprint arXiv:2402.00746 (2024).
  33. Estimation of fair ranking metrics with incomplete judgments. In Proceedings of the Web Conference 2021. 1065–1075.
  34. Large language models for generative recommendation: A survey and visionary discussions. arXiv preprint arXiv:2309.01157 (2023).
  35. A Preliminary Study of ChatGPT on News Recommendation: Personalization, Provider Fairness, Fake News. arXiv preprint arXiv:2306.10702 (2023).
  36. User-oriented fairness in recommendation. In Proceedings of the Web Conference 2021. 624–632.
  37. Proactive conversational agents in the post-chatgpt world. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3452–3455.
  38. Mitigating sentiment bias for recommender systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 31–40.
  39. Advertisement recommendation based on personal interests and ad push fairness. Kybernetes 48, 8 (2019), 1586–1605.
  40. Balancing between accuracy and fairness for interactive recommendation with reinforcement learning. In Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Part I 24. Springer, 155–167.
  41. Cpfair: Personalized consumer and producer fairness re-ranking for recommender systems. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 770–779.
  42. ChatGPT-HealthPrompt. Harnessing the Power of XAI in Prompt-Based Healthcare Decision Support using ChatGPT. In European Conference on Artificial Intelligence. Springer, 382–397.
  43. Fairrec: Two-sided fairness for personalized recommendations in two-sided platforms. In Proceedings of The Web Conference 2020. 1194–1204.
  44. The Unfairness of Active Users and Popularity Bias in Point-of-Interest Recommendation. In Bias@ECIR’22.
  45. Large language models are competitive near cold-start recommenders for language-and item-based preferences. In Proceedings of the 17th ACM conference on recommender systems. 890–896.
  46. Exploring artist gender bias in music recommendation. arXiv preprint arXiv:2009.01715 (2020).
  47. Towards understanding and mitigating unintended biases in language model-driven conversational recommendation. Information Processing & Management 60, 1 (2023), 103139.
  48. Does fair ranking improve minority outcomes? understanding the interplay of human and algorithmic biases in online hiring. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 989–999.
  49. Spot: Better frozen model adaptation through soft prompt transfer. arXiv preprint arXiv:2110.07904 (2021).
  50. Addressing marketing bias in product recommendations. In Proceedings of the 13th international conference on web search and data mining. 618–626.
  51. Defining and measuring fairness in location recommendations. In Proceedings of the 3rd ACM SIGSPATIAL international workshop on location-based recommendations, geosocial networks and geoadvertising. 1–8.
  52. Fairness-aware news recommendation with decomposed adversarial learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4462–4469.
  53. A Survey on Large Language Models for Recommendation. arXiv preprint arXiv:2305.19860 (2023).
  54. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564 (2023).
  55. TFROM: A Two-sided Fairness-Aware Recommendation Model for Both Customers and Providers. arXiv preprint arXiv:2104.09024 (2021).
  56. An enhanced probabilistic fairness-aware group recommendation by incorporating social activeness. Journal of Network and Computer Applications 156 (2020), 102579.
  57. Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis. arXiv preprint arXiv:2401.04997 (2024).
  58. OpenP5: Benchmarking Foundation Models for Recommendation. arXiv preprint arXiv:2306.11134 (2023).
  59. Fairness with overlapping groups; a probabilistic perspective. Advances in neural information processing systems 33 (2020), 4067–4078.
  60. Is chatgpt fair for recommendation? evaluating fairness in large language model recommendation. arXiv preprint arXiv:2305.07609 (2023).
  61. Fairness among new items in cold start recommender systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 767–776.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yashar Deldjoo (46 papers)
  2. Tommaso Di Noia (59 papers)
Citations (11)
X Twitter Logo Streamline Icon: https://streamlinehq.com