Link Prediction in Complex Networks: A Survey (1010.0725v1)

Published 4 Oct 2010 in physics.soc-ph, cs.SI, and physics.comp-ph

Abstract: Link prediction in complex networks has attracted increasing attention from both physical and computer science communities. The algorithms can be used to extract missing information, identify spurious interactions, evaluate network evolving mechanisms, and so on. This article summaries recent progress about link prediction algorithms, emphasizing on the contributions from physical perspectives and approaches, such as the random-walk-based methods and the maximum likelihood methods. We also introduce three typical applications: reconstruction of networks, evaluation of network evolving mechanism and classification of partially labelled networks. Finally, we introduce some applications and outline future challenges of link prediction algorithms.

Citations (2,625)

View on Semantic Scholar

Summary

The paper provides an extensive review of link prediction algorithms, detailing similarity-based, maximum likelihood, and probabilistic methods along with evaluation metrics like AUC and Precision.
It categorizes techniques into local similarity indices, likelihood methods, and probabilistic models to clarify their computational approaches and application nuances.
The survey underscores practical applications in biology, social media, and recommender systems while suggesting future directions including temporal dynamics and hybrid models.

Link Prediction in Complex Networks: An In-Depth Survey

Link prediction in complex networks has emerged as a crucial topic in the intersection of physical and computer science disciplines. The paper "Link Prediction in Complex Networks: A Survey" by Linyuan Lü and Tao Zhou offers a comprehensive review of various methodologies and applications of link prediction algorithms. The authors delve into structural characteristics, evaluation metrics, and practical applications, providing a nuanced understanding of the field.

Problem Description and Evaluation Metrics

The paper starts by defining the problem of link prediction in the context of an undirected network $G(V, E)$ . The task involves estimating the likelihood of the existence of a link between two nodes based on observed links and node attributes. The accuracy of link prediction algorithms is quantified using two standard metrics: area under the receiver operating characteristic curve (AUC) and Precision. The AUC measures the probability that a randomly chosen missing link is given a higher score than a randomly chosen nonexistent link, while Precision focuses on the ratio of relevant items selected to the total number of items selected.

Taxonomy of Link Prediction Algorithms

Link prediction algorithms are categorized into three main classes: similarity-based algorithms, maximum likelihood methods, and probabilistic models.

Similarity-Based Algorithms

These algorithms assign a similarity score $s_{xy}$ to each pair of nodes $x$ and $y$ . The non-observed links are ranked according to their scores, with higher scores indicating a higher likelihood of link existence.

Local Similarity Indices:

Common Neighbors (CN): Counts the number of common neighbors between nodes $x$ and $y$ .
Salton Index: Normalizes common neighbors by the product of degrees.
Jaccard Index: Normalizes common neighbors by the union of neighbors.
Sørensen Index: Doubles the count of common neighbors and normalizes by the sum of degrees.
Hub Promoted/Depressed Indices (HPI/HDI): Vary the emphasis on nodes' degrees.
Leicht-Holme-Newman Index (LHN1): Considers the product of degrees as the denominator.
Preferential Attachment (PA): Uses the product of degrees directly.
Adamic-Adar Index (AA): Assigns higher weights to less-connected neighbors.
Resource Allocation Index (RA): Distributes resources through common neighbors.

Quasi-Local Indices:

Local Path Index (LP): Considers paths of length up to $n$ with a free parameter $\epsilon$ .
Local Random Walk (LRW): Uses random walk probabilities over a fixed number of steps.
Superposed Random Walk (SRW): Iteratively releases walkers, emphasizing nearby nodes.

Global Similarity Indices:

Katz Index: Considers all paths exponentially damped by their lengths.
SimRank: Measures similarity based on common interactions.
Matrix Forest Index (MFI): Uses matrix forest theorem to measure similarity.

Maximum Likelihood Methods

The maximum likelihood methods presuppose some structural organization principles. The Hierarchical Structure Model and Stochastic Block Model are explored. These models infer hierarchical or community structures to predict missing links by maximizing the likelihood of an observed network structure.

Probabilistic Models

Probabilistic models abstract the underlying structure from the observed network. Techniques like Probabilistic Relational Models (PRM), Probabilistic Entity Relationship Models (PERM), and Stochastic Relational Models (SRM) are used. These models optimize a target function to establish a model that best fits the observed network data.

Applications

Link prediction algorithms have extensive applications across various domains:

Biological Networks: Predicting protein-protein interactions, metabolic networks, and food webs can significantly reduce experimental costs.
Social Networks: Enhancing user experiences in online social networks by recommending potential friends or collaborators.
Recommender Systems: Applying link prediction algorithms to user-item bipartite networks to improve the accuracy of recommendations.
Network Reconstruction: Reconstructing networks with missing or spurious links to improve estimates of network properties.
Evaluation of Network Evolving Mechanisms: Comparing different network evolution models by estimating the likelihood of link existence.
Classification in Partially Labeled Networks: Using similarity indices to predict node labels based on known information.

Future Directions

The paper of link prediction is poised to benefit further from the integration of temporal dynamics, handling of multi-dimensional networks, and incorporation of external information like node attributes and community structures. The development of hybrid algorithms and ensemble learning methods also presents a promising avenue for enhancing prediction accuracy.

Conclusion

The paper provides a methodical survey of link prediction in complex networks, highlighting the strengths and limitations of various approaches. The insights garnered from this research have profound implications for practical applications and theoretical advancements in network science.

PDF Markdown