Generative Recommendation: Towards Next-generation Recommender Paradigm

Published 7 Apr 2023 in cs.IR | (2304.03516v2)

Abstract: Recommender systems typically retrieve items from an item corpus for personalized recommendations. However, such a retrieval-based recommender paradigm faces two limitations: 1) the human-generated items in the corpus might fail to satisfy the users' diverse information needs, and 2) users usually adjust the recommendations via inefficient passive feedback, e.g., clicks. Nowadays, AI-Generated Content (AIGC) has revealed significant success, offering the potential to overcome these limitations: 1) generative AI can produce personalized items to satisfy users' information needs, and 2) the newly emerged LLMs significantly reduce the efforts of users to precisely express information needs via natural language instructions. In this light, the boom of AIGC points the way towards the next-generation recommender paradigm with two new objectives: 1) generating personalized content through generative AI, and 2) integrating user instructions to guide content generation. To this end, we propose a novel Generative Recommender paradigm named GeneRec, which adopts an AI generator to personalize content generation and leverages user instructions. Specifically, we pre-process users' instructions and traditional feedback via an instructor to output the generation guidance. Given the guidance, we instantiate the AI generator through an AI editor and an AI creator to repurpose existing items and create new items. Eventually, GeneRec can perform content retrieval, repurposing, and creation to satisfy users' information needs. Besides, to ensure the trustworthiness of the generated items, we emphasize various fidelity checks. Moreover, we provide a roadmap to envision future developments of GeneRec and several domain-specific applications of GeneRec with potential research tasks. Lastly, we study the feasibility of implementing AI editor and AI creator on micro-video generation.

Abstract PDF Upgrade to Chat

Citations (64)

View on Semantic Scholar

Summary

The paper presents a unified GeneRec architecture combining instruction, editing, and creation modules to innovate beyond static retrieval.
It demonstrates personalized micro-video recommendations through techniques like CLIP-based thumbnail selection and AI-driven style transfer.
The study outlines a roadmap for integrating deep user interaction, fidelity checks, and regulatory measures in next-generation recommender systems.

Generative Recommendation: A Comprehensive Analysis of the GeneRec Paradigm

Introduction and Motivation

The conventional retrieval-based recommender systems (RS) predominantly rely on matching user profiles with a static corpus of human-generated items. This paradigm, while effective, is inherently constrained by the coverage of the item corpus and the passivity of user feedback mechanisms. The emergence and rapid maturation of AI-generated content (AIGC) and LLMs have opened new avenues for overcoming these limitations by enabling flexible, on-demand content generation and direct, rich instruction-driven interaction with users. The "Generative Recommendation: Towards Next-generation Recommender Paradigm" paper proposes GeneRec, a unified generative recommendation architecture that fundamentally redefines the nature of both items and user-system interaction.

Figure 1: An example of using generative AI to interact with users and generate new items in the micro-video domain.

The GeneRec Architecture

GeneRec instantiates the generative recommender paradigm through three principal modules:

Instructor: Processes multimodal user instructions and implicit/explicit feedback, determining when and how content generation should be triggered.
AI Editor: Repurposes or edits existing items based on individual preference, leveraging both user histories and specific instructions.
AI Creator: Synthesizes entirely new items, conditioned on user intent (both implicit and explicit) and external knowledge.
Figure 2: Illustration of the GeneRec paradigm. The AI generator takes user instructions and feedback to generate personalized content, which can be directly recommended or stored for future ranking alongside human-generated items.

This architecture introduces a closed loop between the user and an AI generator, distinct from legacy pipelines. Personalized content, either edited or created de novo, can bypass traditional item ranking if explicitly requested or in the event of repeated negative signals on corpus items.

Figure 3: A demonstration of GeneRec showing interactions between users/human producers and AI generators, and illustrating workflows for repurposing and creation.

Roadmap and Evolutionary Perspective

GeneRec presents a comprehensive roadmap for generative RS evolution along three axes:

User-System Interaction: Transitioning from passive feedback (clicks, dwell time) to multimodal conversational interfaces employing LLMs. This will facilitate in-depth, rapid elicitation of user needs and richer preference modeling.
Content Generation: Three-phased evolution—expert-generated, user-generated, and AI-generated content—where AIGC successively augments and then partly supplants manual item creation.
Algorithmic Advances: Convergence of discriminative and generative models, advancing to unified frameworks that can handle retrieval, repurposing, and open-ended creation, all informed by LLM-style architectures.
Figure 4: A potential roadmap for the GeneRec paradigm from user interaction, content generation, and algorithmic innovation perspectives.

Fidelity, Evaluation, and Trustworthiness

The authors make the critical point that with AIGC's flexibility comes a heightened need for fidelity checks, mandatory for any practical deployment. The trustworthiness stack for GeneRec encompasses:

Bias and fairness mitigation
Privacy preservation given the use of implicit/explicit user data
Safety filtering (e.g., toxicity, shilling attacks)
Authenticity verification (to counter hallucination/misinformation)
Legal compliance (including copyright adjudication and domain-specific regulations)
Identifiability (distinction between human-/AI-generated content via watermarking or forensic tools)

Additionally, GeneRec employs a dual-pronged evaluation strategy: item-side (objective metrics like FVD for videos, content relevance) and user-side (explicit/implicit satisfaction signals, retention, etc.).

Application Demonstrations in Micro-video Recommendation

The feasibility study operationalizes the GeneRec modules in the context of micro-video recommendation:

Personalized Thumbnail Selection and Generation: CLIP-based zero-shot selection aligns frames with user histories, while diffusion models (RDM) yield higher-quality, user-aligned thumbnails, outperforming static baselines.
Figure 5: Illustration of the implementation of editing tasks in the AI editor—thumbnail selection, thumbnail generation, and micro-video clipping.

Figure 6: Cases of personalized thumbnail selection by CLIP.

Figure 7: Cases of personalized thumbnail generation.
Micro-video Clipping: Personalized segment extraction using CLIP representations demonstrates user-preferred content localization within longer videos.
Figure 8: Case study of micro-video clipping via CLIP.
Style Transfer and Content Editing: VToonify and MCVD enable both explicit (instruction-driven) and implicit (preference-driven) video repurposing, substantiated by both quantitative improvements (Cosine@K, FVD) and qualitative case studies.
Figure 9: Examples of personalized micro-video style transfer via VToonify.

Figure 10: Case study of personalized micro-video content revision via MCVD (User_Emb).
Personalized Video Creation: Single-turn and multi-turn instructions drive MCVD to synthesize new micro-videos, although with notable current limitations in visual fidelity and domain coverage due to dataset and modeling constraints.
Figure 11: Case study of personalized micro-video content creation via MCVD (User_Emb).

Domain-General Implications and Future Research Directions

The GeneRec paradigm generalizes across multiple domains: news, fashion, music, image, and video. Its modularity allows tailored fidelity checks and generation routines critical for each vertical (e.g., bias detection in news, style/realism in fashion, copyright in music). The approach also foregrounds new research directions:

Instruction tuning for LLM-based instructors specialized for recommendation domains
Behavioral policy learning for generator activation and output routing (direct recommendation vs. corpus insertion)
Unified modeling for personalized editing and creation leveraging both explicit instructions and multimodal user histories
Domain-specific fidelity verifiers, evaluation protocols, and regulatory frameworks for AI-generated recommendations
Figure 12: Illustration of advanced foundation models and cross-modal AIGC applications.

Comparative Perspective

GeneRec is differentiated from existing conversational recommenders by (a) robust instruction-following faculties powered by LLMs, (b) generative content workflows vs. mere retrieval, and (c) joint optimization for relevance, trust, and compliance. Unlike generic AIGC, GeneRec capitalizes on established user modeling strategies in RS for implicit/explicit preference extraction and integrates generative and discriminative pathways.

Conclusion

GeneRec represents a systemic shift in recommender paradigms, tightly integrating content generation, nuanced user-system communication, and trustworthy recommendation. The presented experiments validate both the opportunities and the technical gaps of current AIGC for personalization. As generative models and LLMs continue to scale in capability and domain adaptation, GeneRec's vision points toward truly user-defined, instruction-driven, and multimodal recommender systems. The paradigm will drive foundational research in fidelity assessment, legal/ethical compliance, unified generative architectures, and advanced user modeling—ushering in a new era of personalized information access and content creation.

Markdown