Solving the Cold-Start Problem in Recommender Systems with Social Tags (1004.3732v2)

Published 21 Apr 2010 in cs.IR and physics.soc-ph

Abstract: In this paper, based on the user-tag-object tripartite graphs, we propose a recommendation algorithm, which considers social tags as an important role for information retrieval. Besides its low cost of computational time, the experiment results of two real-world data sets, \emph{Del.icio.us} and \emph{MovieLens}, show it can enhance the algorithmic accuracy and diversity. Especially, it can obtain more personalized recommendation results when users have diverse topics of tags. In addition, the numerical results on the dependence of algorithmic accuracy indicates that the proposed algorithm is particularly effective for small degree objects, which reminds us of the well-known \emph{cold-start} problem in recommender systems. Further empirical study shows that the proposed algorithm can significantly solve this problem in social tagging systems with heterogeneous object degree distributions.

Citations (347)

View on Semantic Scholar

Summary

The paper’s main contribution is a novel diffusion-based algorithm that uses social tags to bridge users and objects, effectively addressing the cold-start problem.
The methodology employs a user-tag-object tripartite graph and evaluates performance on Del.icio.us and MovieLens using ranking and diversity metrics.
Results indicate improved recommendation accuracy for low-degree objects and enhanced diversity through personalized tag usage.

The paper introduces a diffusion-based recommendation algorithm leveraging social tags within a user-tag-object tripartite graph to address the cold-start problem in recommender systems. The algorithm posits that social tags serve as a conduit connecting users to objects, thereby enhancing recommendation accuracy and diversity.

The proposed algorithm is compared against two baseline algorithms:

User-object diffusion
User-object-tag diffusion

In contrast, the proposed algorithm, user-tag-object diffusion, posits that resources are initially located on tags based on their usage frequency by a target user $U_i$ . These resources are then distributed to neighboring objects. The final resource vector $\vec{f''}$ is expressed as:

$f''_j=\sum_{l=1}^r\frac{a'_{jl}a''_{il}}{k'(T_l)}$

where:

$f''_j$ is the final score of object $j$
$a'_{jl}$ represents the object-tag relation, where $a'_{jk} = 1$ if object $O_j$ has been assigned by tag $T_k$ , and $a'_{jk} = 0$ otherwise.
$a''_{il}$ represents the user-tag relation, where $a''_{ik}$ is the number of times that user $U_i$ has adopted tag $T_k$ .
$k'(T_l)$ is the number of neighboring objects for tag $T_l$ , where $k'(T_l)=\sum_{j=1}^ma'_{jl}$ .

The algorithm's advantages are its capacity to generate personalized recommendations, reduced computational time, and the explicit modeling of tags as bridges between users and objects.

The methodology employs two real-world datasets: Del.icio.us and MovieLens. The datasets are preprocessed to remove isolated nodes, ensuring a minimum level of user-object-tag interaction. The purified datasets' statistics are summarized in Table 1, presenting the number of users ( $n$ ), objects ( $m$ ), tags ( $r$ ), average number of objects collected by a user ( $\langle k\rangle$ ), average number of tags assigned to an object ( $\langle k'\rangle$ ), and the average number of tags adopted by a user ( $\langle k''\rangle$ ). Each dataset is divided into training (90%) and testing (10%) sets.

The performance of the algorithm is evaluated using three metrics:

Ranking Score ( $RS$ ): Defined as the rank of the object divided by the number of all uncollected objects for the corresponding user.
Inter Diversity ( $InterD$ ): Measures the differences in recommendation lists between users. Given $O^i_R$ as the set of recommended objects for user $U_i$ , $InterD$ is calculated as:

$InterD = \frac{2}{n(n-1)}\sum_{i\neq j}\left(1-\frac{|O^i_R\cap O^j_R|}{L}\right)$

where $L=|O^i_R|$ is the length of the recommendation list.
Inner Diversity ( $InnerD$ ): Measures the diversity of objects within a user's recommendation list. $InnerD$ is calculated as:

$InnerD = 1-\frac{2}{nL(L-1)}\sum^n_{i=1}\sum_{j\neq l,j,l\in O^i_R}S_{jl}$

where $S_{jl}=\frac{|\Gamma_{O_j}\cap\Gamma_{O_l}|}{\sqrt{|\Gamma_{O_j}|\times |\Gamma_{O_l}|}$ is the cosine similarity between objects $O_j$ and $O_l$ , and $\Gamma_{O_j}$ denotes the set of users having collected object $O_j$ .

The results indicate that the proposed algorithm enhances the ranking score $\langle RS\rangle$ , particularly for objects with a degree $k_o \leq 10$ . This suggests the algorithm's effectiveness in addressing the cold-start problem. Tables 2 and 3 present the overall $\langle RS\rangle$ values for the three algorithms across the datasets. Further analysis reveals that the algorithm's accuracy is better for $k_o\leq$ 10, but worse when $k_o>$ 10, as shown in Figure 1.

The diversity analysis, presented in Figures 3 and 4, indicates that $\langle InterD\rangle$ is enhanced only for Del.icio.us. The overlapping ratio ( $OR$ ) of tags for users is quantified as:

$OR_{g} = \frac{1}{N_g}\sum_{i\neq j, G(i,j)=g}OR(i,j)$

where $N_g$ is the number of user pairs $(i,j)$ such that $i\neq j$ , and $G(i,j)=g$ denotes the number of common objects collected by users $i$ and $j$ . $OR(i,j)$ is defined as the total number of tag agreements on the same objects for user pair $(i,j)$ . The results show that $\langle OR\rangle_g$ of tags is smaller than that of objects in Del.icio.us, while it is not the case for MovieLens, indicating that diverse tag usage is crucial for generating diverse recommendations.

The Shannon entropy $E\left(U_i\right)$ is used to measure individual tag usage patterns:

$E\left(U_i\right) = -\sum_t p_{i;t}\textmd{ln}(p_{i;t})$

where $p_{i;t}$ is the probability for tag $t$ used by user $U_i$ . The analysis reveals that $E$ is greater for Del.icio.us than for MovieLens, both for users and objects. This suggests that Del.icio.us is a more diverse system, which explains why the proposed algorithm achieves better $\langle InnerD\rangle$ in Del.icio.us than in MovieLens.

PDF Markdown

Solving the Cold-Start Problem in Recommender Systems with Social Tags (1004.3732v2)

Summary

Related Papers