LLM-as-RS: Unifying LLMs in Recommender Systems

Updated 1 October 2025

LLM-as-RS is a paradigm that models recommendation as conditional text generation by encoding user history into natural language prompts.
Scaling experiments show LLM-as-RS achieves up to a 20% improvement in Recall@k, with performance increasing steadily as LLM size and LoRA parameters grow.
The approach uses parameter-efficient fine-tuning (e.g., LoRA) to bypass semantic ID bottlenecks, enhancing semantic understanding and collaborative filtering for robust recommendations.

The LLM-as-RS paradigm refers to the use of LLMs as the central modeling and inference engine within recommender systems, unifying semantic understanding with collaborative filtering in a generative, often end-to-end, fashion. Originating as a response to the limitations of traditional ID-based and SID-based generative recommenders, this approach treats recommendation as a text-to-text sequence modeling task: a prompt encoding a user’s history (in natural language) is fed to an LLM, which then predicts the next item’s title. The LLM-as-RS method leverages the representational capacity and scaling laws of modern LLMs to directly learn both rich semantic information and higher-order user–item interaction patterns. It departs fundamentally from paradigms relying on semantic IDs or discrete codes, promising significantly improved performance and transferability across domains.

1. Formal Definition and Model Paradigms

Two main generative recommendation (GR) paradigms are contrasted:

SID-based GR: Item content is processed by a modality encoder and a vector quantization tokenizer to obtain short discrete codes (semantic IDs, SIDs) that are used as tokens in sequential modeling. This setup trains a conventional transformer to autoregressively predict SIDs in a user interaction sequence.
LLM-as-RS: The entire recommendation task is modeled as a conditional generation problem. The user history and candidate items are represented directly as text. The LLM, typically a large-scale, pretrained decoder-only model, is fine-tuned (parameter-efficiently, e.g., using LoRA) to take a prompt with user history and generate the intended next item as a text span (such as the item’s title).

The LLM-as-RS approach forgoes semantic ID compression, instead leveraging the LLM’s native capacity for both semantic and collaborative filtering (CF) reasoning through simple text-based prompts.

2. Scaling Laws and Performance Saturation

Empirical Scaling in SID-based GR vs. LLM-as-RS

In SID-based GR, scaling laws are observed only in low-capacity regimes. When increasing the capacity of the sequential recommender module beyond approximately $10^7$ parameters, performance quickly saturates. Furthermore, increasing the size of the underlying LLM encoder or quantization tokenizer does not overcome this ceiling, due to inherent loss of information in the quantization bottleneck.
In contrast, the LLM-as-RS paradigm exhibits robust scaling behavior with both LoRA adaptation parameters and the frozen LLM backbone, with no observed saturation up to LLM sizes of 14B parameters. The empirical law describing Recall@k as a function of parameters is:

$\text{Recall@}k = R_0 - \frac{A}{(N_{\text{LoRA}} + \gamma N_{\text{LLM}})^{a}} - \frac{B}{(N_{\text{LoRA}} + \beta N_{\text{LLM}})^{b}},$

where $N_{\text{LoRA}}$ is the number of trainable LoRA parameters, $N_{\text{LLM}}$ is the number of (frozen) LLM backbone parameters, and $A, B, a, b, \gamma, \beta$ are fitted constants.

Performance on standard metrics (e.g., Recall@k) improves monotonically with LLM size and adaptation depth, achieving up to a 20% improvement over the peak achievable by SID-based GR at the same data and training budgets.

3. Semantic and CF Signal Unification

A central claim addressed in this line of research is whether LLMs can model collaborative filtering information:

Analysis and scaling experiments show that, contrary to prevailing belief, larger LLMs inherently capture more CF knowledge. This is quantitatively supported by the positive $\beta$ and $\gamma$ coefficients in the scaling formula above. Fitting the scaling law to results across model sizes demonstrates that both semantic and collaborative signals are absorbed as the LLM scales.
When external CF embeddings (e.g., from SASRec) are provided as input, they yield benefits for small LLMs, but as LLM size increases, the benefit diminishes and eventually becomes negligible—a result incompatible with the hypothesis that the LLM is “blind” to CF signals.
Experimental results on cold-start items (unseen during training) further substantiate that LLM-as-RS outperforms SID-based GR, leveraging both semantic generalization and learned CF structure.

4. Parameter-Efficient Fine-Tuning and Implementation

The LLM-as-RS paradigm is implemented through parameter-efficient adaptation:

LoRA adaptation is the principal technique—fine-tuning only a small subset (∼1%) of model parameters, while the vast majority of the LLM remains frozen. This enables rapid adaptation on recommendation datasets and preserves the LLM’s general-purpose language and reasoning capabilities.
Text prompts encode a user’s history and candidate items with natural language instructions, exploiting the LLM’s open-vocabulary design and transfer capabilities.
Inference cost is higher than for SID predictions, since the LLM must generate multiple tokens for each recommendation (e.g., full titles vs. short SIDs). The increased compute is justified when accuracy and transferability are the priorities.

5. Experimental Evidence and Comparative Analysis

A summary of key findings corroborating the effectiveness of LLM-as-RS:

Approach	Scaling Saturation	Performance Gain (Recall@k)	Cold-Start Performance	CF Signal Utilization
SID-based GR	Rapid (∼10M)	Plateaus	Limited	Low, bottlenecked by SIDs
LLM-as-RS	No observed	Up to +20% over SID-GR	Superior	Increasing with LLM scale (β>0)

Experimental curves (e.g., “lora total scaling”) consistently show that LLM-as-RS improves with both LoRA and LLM size, whereas SID-based approaches flatten early.

This suggests that for applications with sufficient computational resources, LLM-as-RS is the preferred paradigm when recall, robustness to unseen items, and transferability are critical design criteria.

6. Limitations and Practical Considerations

While LLM-as-RS excels on accuracy and scaling, it incurs greater inference cost per recommendation due to text generation. Applications with stringent latency or cost requirements may still opt for hybrid approaches.
The paradigm’s reliance on pretrained LLMs means that model selection and prompt construction remain areas for further paper, particularly with respect to fairness and explainability.
The empirical scaling law may not extrapolate indefinitely; further research is needed to understand limitations at much larger scales or in settings where item content is sparse.

7. Significance and Outlook

The LLM-as-RS paradigm represents a significant development in foundation models for generative recommendation systems. By leveraging LLMs’ unified treatment of semantic and interaction signals in a scalable, parameter-efficient framework, it demonstrates that the technical bottlenecks imposed by quantization or codebook design in SID-based GR can be surpassed. Its scaling properties and empirical performance strongly motivate further integration of LLMs as central engines in next-generation recommender architectures (Liu et al., 29 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

Understanding Generative Recommendation with Semantic IDs from a Model-scaling View (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to LLM-as-RS Paradigm.