Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Solving the Cold-Start Problem in Recommender Systems with Social Tags (1004.3732v2)

Published 21 Apr 2010 in cs.IR and physics.soc-ph

Abstract: In this paper, based on the user-tag-object tripartite graphs, we propose a recommendation algorithm, which considers social tags as an important role for information retrieval. Besides its low cost of computational time, the experiment results of two real-world data sets, \emph{Del.icio.us} and \emph{MovieLens}, show it can enhance the algorithmic accuracy and diversity. Especially, it can obtain more personalized recommendation results when users have diverse topics of tags. In addition, the numerical results on the dependence of algorithmic accuracy indicates that the proposed algorithm is particularly effective for small degree objects, which reminds us of the well-known \emph{cold-start} problem in recommender systems. Further empirical study shows that the proposed algorithm can significantly solve this problem in social tagging systems with heterogeneous object degree distributions.

Citations (347)

Summary

  • The paper’s main contribution is a novel diffusion-based algorithm that uses social tags to bridge users and objects, effectively addressing the cold-start problem.
  • The methodology employs a user-tag-object tripartite graph and evaluates performance on Del.icio.us and MovieLens using ranking and diversity metrics.
  • Results indicate improved recommendation accuracy for low-degree objects and enhanced diversity through personalized tag usage.

The paper introduces a diffusion-based recommendation algorithm leveraging social tags within a user-tag-object tripartite graph to address the cold-start problem in recommender systems. The algorithm posits that social tags serve as a conduit connecting users to objects, thereby enhancing recommendation accuracy and diversity.

The proposed algorithm is compared against two baseline algorithms:

  • User-object diffusion
  • User-object-tag diffusion

In contrast, the proposed algorithm, user-tag-object diffusion, posits that resources are initially located on tags based on their usage frequency by a target user UiU_i. These resources are then distributed to neighboring objects. The final resource vector f\vec{f''} is expressed as:

fj=l=1rajlailk(Tl)f''_j=\sum_{l=1}^r\frac{a'_{jl}a''_{il}}{k'(T_l)}

where:

  • fjf''_j is the final score of object jj
  • ajla'_{jl} represents the object-tag relation, where ajk=1a'_{jk} = 1 if object OjO_j has been assigned by tag TkT_k, and ajk=0a'_{jk} = 0 otherwise.
  • aila''_{il} represents the user-tag relation, where aika''_{ik} is the number of times that user UiU_i has adopted tag TkT_k.
  • k(Tl)k'(T_l) is the number of neighboring objects for tag TlT_l, where k(Tl)=j=1majlk'(T_l)=\sum_{j=1}^ma'_{jl}.

The algorithm's advantages are its capacity to generate personalized recommendations, reduced computational time, and the explicit modeling of tags as bridges between users and objects.

The methodology employs two real-world datasets: Del.icio.us and MovieLens. The datasets are preprocessed to remove isolated nodes, ensuring a minimum level of user-object-tag interaction. The purified datasets' statistics are summarized in Table 1, presenting the number of users (nn), objects (mm), tags (rr), average number of objects collected by a user (k\langle k\rangle), average number of tags assigned to an object (k\langle k'\rangle), and the average number of tags adopted by a user (k\langle k''\rangle). Each dataset is divided into training (90%) and testing (10%) sets.

The performance of the algorithm is evaluated using three metrics:

  1. Ranking Score (RSRS): Defined as the rank of the object divided by the number of all uncollected objects for the corresponding user.
  2. Inter Diversity (InterDInterD): Measures the differences in recommendation lists between users. Given ORiO^i_R as the set of recommended objects for user UiU_i, InterDInterD is calculated as:

    InterD=2n(n1)ij(1ORiORjL)InterD = \frac{2}{n(n-1)}\sum_{i\neq j}\left(1-\frac{|O^i_R\cap O^j_R|}{L}\right)

    where L=ORiL=|O^i_R| is the length of the recommendation list.

  3. Inner Diversity (InnerDInnerD): Measures the diversity of objects within a user's recommendation list. InnerDInnerD is calculated as:

    InnerD=12nL(L1)i=1njl,j,lORiSjlInnerD = 1-\frac{2}{nL(L-1)}\sum^n_{i=1}\sum_{j\neq l,j,l\in O^i_R}S_{jl}

    where $S_{jl}=\frac{|\Gamma_{O_j}\cap\Gamma_{O_l}|}{\sqrt{|\Gamma_{O_j}|\times |\Gamma_{O_l}|}$ is the cosine similarity between objects OjO_j and OlO_l, and ΓOj\Gamma_{O_j} denotes the set of users having collected object OjO_j.

The results indicate that the proposed algorithm enhances the ranking score RS\langle RS\rangle, particularly for objects with a degree ko10k_o \leq 10. This suggests the algorithm's effectiveness in addressing the cold-start problem. Tables 2 and 3 present the overall RS\langle RS\rangle values for the three algorithms across the datasets. Further analysis reveals that the algorithm's accuracy is better for kok_o\leq10, but worse when ko>k_o>10, as shown in Figure 1.

The diversity analysis, presented in Figures 3 and 4, indicates that InterD\langle InterD\rangle is enhanced only for Del.icio.us. The overlapping ratio (OROR) of tags for users is quantified as:

ORg=1Ngij,G(i,j)=gOR(i,j)OR_{g} = \frac{1}{N_g}\sum_{i\neq j, G(i,j)=g}OR(i,j)

where NgN_g is the number of user pairs (i,j)(i,j) such that iji\neq j, and G(i,j)=gG(i,j)=g denotes the number of common objects collected by users ii and jj. OR(i,j)OR(i,j) is defined as the total number of tag agreements on the same objects for user pair (i,j)(i,j). The results show that ORg\langle OR\rangle_g of tags is smaller than that of objects in Del.icio.us, while it is not the case for MovieLens, indicating that diverse tag usage is crucial for generating diverse recommendations.

The Shannon entropy E(Ui)E\left(U_i\right) is used to measure individual tag usage patterns:

E(Ui)=tpi;tln(pi;t)E\left(U_i\right) = -\sum_t p_{i;t}\textmd{ln}(p_{i;t})

where pi;tp_{i;t} is the probability for tag tt used by user UiU_i. The analysis reveals that EE is greater for Del.icio.us than for MovieLens, both for users and objects. This suggests that Del.icio.us is a more diverse system, which explains why the proposed algorithm achieves better InnerD\langle InnerD\rangle in Del.icio.us than in MovieLens.