Solving the Cold-Start Problem in Recommender Systems with Social Tags
(1004.3732v2)
Published 21 Apr 2010 in cs.IR and physics.soc-ph
Abstract: In this paper, based on the user-tag-object tripartite graphs, we propose a recommendation algorithm, which considers social tags as an important role for information retrieval. Besides its low cost of computational time, the experiment results of two real-world data sets, \emph{Del.icio.us} and \emph{MovieLens}, show it can enhance the algorithmic accuracy and diversity. Especially, it can obtain more personalized recommendation results when users have diverse topics of tags. In addition, the numerical results on the dependence of algorithmic accuracy indicates that the proposed algorithm is particularly effective for small degree objects, which reminds us of the well-known \emph{cold-start} problem in recommender systems. Further empirical study shows that the proposed algorithm can significantly solve this problem in social tagging systems with heterogeneous object degree distributions.
The paper’s main contribution is a novel diffusion-based algorithm that uses social tags to bridge users and objects, effectively addressing the cold-start problem.
The methodology employs a user-tag-object tripartite graph and evaluates performance on Del.icio.us and MovieLens using ranking and diversity metrics.
Results indicate improved recommendation accuracy for low-degree objects and enhanced diversity through personalized tag usage.
The paper introduces a diffusion-based recommendation algorithm leveraging social tags within a user-tag-object tripartite graph to address the cold-start problem in recommender systems. The algorithm posits that social tags serve as a conduit connecting users to objects, thereby enhancing recommendation accuracy and diversity.
The proposed algorithm is compared against two baseline algorithms:
User-object diffusion
User-object-tag diffusion
In contrast, the proposed algorithm, user-tag-object diffusion, posits that resources are initially located on tags based on their usage frequency by a target user Ui. These resources are then distributed to neighboring objects. The final resource vector f′′ is expressed as:
fj′′=∑l=1rk′(Tl)ajl′ail′′
where:
fj′′ is the final score of object j
ajl′ represents the object-tag relation, where ajk′=1 if object Oj has been assigned by tag Tk, and ajk′=0 otherwise.
ail′′ represents the user-tag relation, where aik′′ is the number of times that user Ui has adopted tag Tk.
k′(Tl) is the number of neighboring objects for tag Tl, where k′(Tl)=∑j=1majl′.
The algorithm's advantages are its capacity to generate personalized recommendations, reduced computational time, and the explicit modeling of tags as bridges between users and objects.
The methodology employs two real-world datasets: Del.icio.us and MovieLens. The datasets are preprocessed to remove isolated nodes, ensuring a minimum level of user-object-tag interaction. The purified datasets' statistics are summarized in Table 1, presenting the number of users (n), objects (m), tags (r), average number of objects collected by a user (⟨k⟩), average number of tags assigned to an object (⟨k′⟩), and the average number of tags adopted by a user (⟨k′′⟩). Each dataset is divided into training (90%) and testing (10%) sets.
The performance of the algorithm is evaluated using three metrics:
Ranking Score (RS): Defined as the rank of the object divided by the number of all uncollected objects for the corresponding user.
Inter Diversity (InterD): Measures the differences in recommendation lists between users. Given ORi as the set of recommended objects for user Ui, InterD is calculated as:
InterD=n(n−1)2i=j∑(1−L∣ORi∩ORj∣)
where L=∣ORi∣ is the length of the recommendation list.
Inner Diversity (InnerD): Measures the diversity of objects within a user's recommendation list. InnerD is calculated as:
InnerD=1−nL(L−1)2i=1∑nj=l,j,l∈ORi∑Sjl
where $S_{jl}=\frac{|\Gamma_{O_j}\cap\Gamma_{O_l}|}{\sqrt{|\Gamma_{O_j}|\times |\Gamma_{O_l}|}$ is the cosine similarity between objects Oj and Ol, and ΓOj denotes the set of users having collected object Oj.
The results indicate that the proposed algorithm enhances the ranking score ⟨RS⟩, particularly for objects with a degree ko≤10. This suggests the algorithm's effectiveness in addressing the cold-start problem. Tables 2 and 3 present the overall ⟨RS⟩ values for the three algorithms across the datasets. Further analysis reveals that the algorithm's accuracy is better for ko≤10, but worse when ko>10, as shown in Figure 1.
The diversity analysis, presented in Figures 3 and 4, indicates that ⟨InterD⟩ is enhanced only for Del.icio.us. The overlapping ratio (OR) of tags for users is quantified as:
ORg=Ng1i=j,G(i,j)=g∑OR(i,j)
where Ng is the number of user pairs (i,j) such that i=j, and G(i,j)=g denotes the number of common objects collected by users i and j. OR(i,j) is defined as the total number of tag agreements on the same objects for user pair (i,j). The results show that ⟨OR⟩g of tags is smaller than that of objects in Del.icio.us, while it is not the case for MovieLens, indicating that diverse tag usage is crucial for generating diverse recommendations.
The Shannon entropy E(Ui) is used to measure individual tag usage patterns:
E(Ui)=−t∑pi;tln(pi;t)
where pi;t is the probability for tag t used by user Ui. The analysis reveals that E is greater for Del.icio.us than for MovieLens, both for users and objects. This suggests that Del.icio.us is a more diverse system, which explains why the proposed algorithm achieves better ⟨InnerD⟩ in Del.icio.us than in MovieLens.