Supervised Random Walks: Predicting and Recommending Links in Social Networks (1011.4071v1)

Published 17 Nov 2010 in cs.SI, cs.AI, cs.DS, physics.soc-ph, and stat.ML

Abstract: Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively studied, the challenge of how to effectively combine the information from the network structure with rich node and edge attribute data remains largely open. We develop an algorithm based on Supervised Random Walks that naturally combines the information from the network structure with node and edge level attributes. We achieve this by using these attributes to guide a random walk on the graph. We formulate a supervised learning task where the goal is to learn a function that assigns strengths to edges in the network such that a random walker is more likely to visit the nodes to which new links will be created in the future. We develop an efficient training algorithm to directly learn the edge strength estimation function. Our experiments on the Facebook social graph and large collaboration networks show that our approach outperforms state-of-the-art unsupervised approaches as well as approaches that are based on feature extraction.

Citations (1,113)

View on Semantic Scholar

Summary

The paper introduces Supervised Random Walks to predict and recommend links by biasing random walks using node and edge attributes.
It formulates link prediction as an optimization problem, employing gradient-based methods to fine-tune edge strength functions.
Experiments on real-world datasets demonstrate its superior accuracy over baseline models, enhancing social network recommendation systems.

Supervised Random Walks: Predicting and Recommending Links in Social Networks

The paper titled "Supervised Random Walks: Predicting and Recommending Links in Social Networks" by Lars Backstrom and Jure Leskovec introduces a robust algorithm designed to predict and recommend links within social networks by integrating both network topology and node-specific attributes. The fundamental problem under examination is link prediction: given a snapshot of a network, how can we infer the interactions likely to occur in the near future or identify currently missing interactions?

Algorithm Design

The proposed solution employs a technique called Supervised Random Walks (SRW). In essence, this method biases a random walk on the graph by tailoring the transition probabilities between nodes using node and edge attributes. By training this model in a supervised learning framework, the authors aim to optimize edge strength functions so that the biased random walk favours visiting nodes likely to form new links.

The approach is formulated as an optimization problem: given a network snapshot and known link formation data, learn the parameters of an edge strength function that yields high predictive performance. This is achieved through iterative computation of PageRank-like scores and their partial derivatives with respect to the model parameters. Key considerations include the choice of the random walk restart parameter (α), loss functions, and the edge strength function.

Practical Implementation and Results

Experiments were conducted on real-world datasets, including a Facebook social graph and multiple large-scale academic collaboration networks. The proposed method shows a clear performance improvement over both state-of-the-art unsupervised methods and traditional supervised methods relying on feature extraction. Specifically, in just one of their evaluations on the Facebook network, the algorithm demonstrated the capability to recommend friends with an accuracy that was superior to baseline models.

Key findings can be summarized as follows:

Choice of Loss Function: The Wilcoxon-Mann-Whitney (WMW) loss function demonstrated better correlation with performance metrics, while the Huber and squared losses did not translate to significant performance gains.
Node and Edge Features: Including specific features like friendship initiation, communication frequency, and common friends significantly influenced the model's success.
Parameter Estimation: The SRW model showed high sensitivity to proper parameter tuning, with certain settings (e.g., restart probability α = 0.3) providing optimal results.
Runtime Considerations: The iterative nature of the training process required substantial computation, particularly when differentiating edge types or running multiple independent training sessions.

Implications and Future Directions

The implications of this research are multifaceted. Practically, the SRW algorithm can enhance functionalities in social networking platforms by improving friend recommendations and revealing hidden social connections, which can increase user engagement and satisfaction. Academically, this algorithm provides a framework for integrating rich feature sets with structural network data, an area previously challenging due to the heuristic nature of feature engineering.

Theoretically, this work opens new dialogues about network dynamics and link formation processes. By informing edge strength based on node features in a systematic, gradient-based manner, SRWs represent a shift towards more analytically grounded link prediction methodologies.

Future advancements may focus on further optimizing the algorithm's efficiency, perhaps by leveraging parallel processing or incorporating additional types of relational data. Other potential developments include extending the SRW approach to handle evolving networks, where node and edge features change over time.

Conclusion

The Supervised Random Walk approach proposed by Backstrom and Leskovec is an effective and innovative solution to the link prediction problem in social networks. It strategically integrates structural and attribute-based data to enhance predictive accuracy. The performance metrics from real-world datasets substantiate its efficacy, marking a significant step forward in network analysis and recommendation systems.

PDF Markdown