Review-LLM: Harnessing Large Language Models for Personalized Review Generation (2407.07487v1)

Published 10 Jul 2024 in cs.CL

Abstract: Product review generation is an important task in recommender systems, which could provide explanation and persuasiveness for the recommendation. Recently, LLMs (LLMs, e.g., ChatGPT) have shown superior text modeling and generating ability, which could be applied in review generation. However, directly applying the LLMs for generating reviews might be troubled by the ``polite'' phenomenon of the LLMs and could not generate personalized reviews (e.g., negative reviews). In this paper, we propose Review-LLM that customizes LLMs for personalized review generation. Firstly, we construct the prompt input by aggregating user historical behaviors, which include corresponding item titles and reviews. This enables the LLMs to capture user interest features and review writing style. Secondly, we incorporate ratings as indicators of satisfaction into the prompt, which could further improve the model's understanding of user preferences and the sentiment tendency control of generated reviews. Finally, we feed the prompt text into LLMs, and use Supervised Fine-Tuning (SFT) to make the model generate personalized reviews for the given user and target item. Experimental results on the real-world dataset show that our fine-tuned model could achieve better review generation performance than existing close-source LLMs.

PDF HTML Abstract

Review-LLM: Harnessing LLMs for Personalized Review Generation

Introduction

The paper "Review-LLM: Harnessing LLMs for Personalized Review Generation" addresses the challenge of generating personalized reviews in e-commerce settings using LLMs. While LLMs like ChatGPT exhibit superior text modeling capabilities, leveraging these models directly for review generation poses certain issues, such as the tendency to generate overly polite reviews and the lack of personalized input from user history. To tackle this, the authors propose Review-LLM, a system that customizes LLMs to account for user-specific preferences and sentiments, improving the quality and relevance of the generated reviews.

Methodology

The proposed Review-LLM framework reconstructs the prompt input by incorporating user historical behaviors, item titles, and corresponding reviews. By integrating this information, the model can better capture user interest features and review writing styles. Additionally, user ratings are included in the prompt to indicate satisfaction levels, thus influencing the sentiment of the generated reviews.

Review-LLM utilizes Supervised Fine-Tuning (SFT) with Low-Rank Adaptation (LoRA) for parameter-efficient training. This fine-tuning process allows the LLM to generate personalized reviews for given user and target items. The input prompt for Review-LLM is composed of the following:

Generation Instruction: Instructs the LLM to consider user preferences and historical behaviors to generate the review.
Input: Contains the items previously interacted with by the user, along with their titles, reviews, and ratings.
Target Item: Information about the newly purchased item and its rating.
Response: The generated review for the target item.

Experimental Results

The authors conducted experiments on five Amazon review datasets and compared the Review-LLM with several baselines, including GPT-3.5-Turbo, GPT-4o, and Llama-3-8b. The performance was evaluated using metrics such as ROUGE-1, ROUGE-L, and BERT similar score (BertScore).

Simple Evaluation

The experimental results indicate that Review-LLM significantly outperforms the baselines across all metrics. Specifically, the inclusion of user ratings in the prompt contributes to better performance:

ROUGE-1: 31.15
ROUGE-L: 26.88
BertScore: 49.52

Negative Review Performance

To test the model's ability to generate negative reviews, a hard evaluation dataset composed of negative reviews was used. Review-LLM demonstrated a superior performance in reflecting user dissatisfaction compared to the baselines, reaffirming the effectiveness of incorporating rating information:

ROUGE-1: 21.93
ROUGE-L: 16.63
BertScore: 39.35

Human Evaluation and Case Study

Human evaluators confirmed that Review-LLM's generated reviews were more semantically consistent with the reference reviews. A case paper further illustrated that Review-LLM could produce reviews that better reflect the user's sentiment and writing style, compared to GPT-3.5-Turbo and GPT-4o.

Implications and Future Work

The findings imply that personalized review generation can be significantly enhanced by aggregating rich user behavior data and integrating it into LLMs through supervised fine-tuning. Practically, this approach can improve the quality and relevance of automated reviews in e-commerce platforms, potentially enhancing user satisfaction and engagement.

Future research should focus on addressing the limitations of the current framework. Specifically, capturing the diversity of individual preferences and incorporating the temporal dynamics of user interactions could further refine the personalization aspect. Additionally, extending this approach to other domains where personalized content generation is critical could offer broader applicability.

Conclusion

The proposed Review-LLM framework successfully leverages LLMs for personalized review generation by integrating detailed user behavior data and ratings into the model inputs. The fine-tuning approach ensures that the generated reviews reflect user-specific preferences and sentiments, outperforming state-of-the-art models like GPT-3.5-Turbo and GPT-4o. This work underscores the potential of LLMs in enhancing personalized content generation in recommender systems, paving the way for future innovations in AI-driven personalization.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Qiyao Peng (19 papers)
Hongtao Liu (44 papers)
Hongyan Xu (9 papers)
Qing Yang (138 papers)
Minglai Shao (17 papers)
Wenjun Wang (32 papers)

Related Papers

Find Related Papers