On the Convexity of Latent Social Network Inference (1010.5504v1)

Published 26 Oct 2010 in cs.SI and physics.soc-ph

Abstract: In many real-world scenarios, it is nearly impossible to collect explicit social network data. In such cases, whole networks must be inferred from underlying observations. Here, we formulate the problem of inferring latent social networks based on network diffusion or disease propagation data. We consider contagions propagating over the edges of an unobserved social network, where we only observe the times when nodes became infected, but not who infected them. Given such node infection times, we then identify the optimal network that best explains the observed data. We present a maximum likelihood approach based on convex programming with a l1-like penalty term that encourages sparsity. Experiments on real and synthetic data reveal that our method near-perfectly recovers the underlying network structure as well as the parameters of the contagion propagation model. Moreover, our approach scales well as it can infer optimal networks of thousands of nodes in a matter of minutes.

Authors (2)

Seth A. Myers (4 papers)
Jure Leskovec (233 papers)

Citations (302)

View on Semantic Scholar

Summary

Convexity in Latent Social Network Inference

The paper "On the Convexity of Latent Social Network Inference" presents a compelling approach to reconstructing latent social networks from diffusion data, specifically focusing on scenarios where infection occurs without direct knowledge of the transmission path. This research addresses a significant challenge in social network analysis: inferring entire networks when direct observation of node interactions is not feasible.

The inference problem is tackled by developing a convex programming framework that models the spread of contagions across an unobserved network. This methodology differs from traditional approaches that rely heavily on available pairwise interaction data and threshold-based edge selection. Instead, the model operates under the assumption that while the timestamps of infection are observable, the specific pathways through which these infections spread are not.

Convex Programming Solution

Central to this research is a maximum likelihood estimation (MLE) framework that incorporates a generative model for information diffusion across a hypothetical static network. By maximizing the likelihood of the observed infection times given a particular network and set of propagation parameters, the paper reformulates the problem into a convex optimization task. This approach benefits from the well-founded mathematical guarantees of convex optimization, ensuring a globally optimal solution is attainable.

The likelihood function comprises two aspects: the infection probability at a given time based on previously infected nodes, and the non-infection probability for non-infected nodes. Despite the potential indeterminacy of the Hessian matrix, the authors establish convexity by deriving an equivalent geometric program, enabling robust solutions for networks with thousands of nodes.

Incorporating Sparsity and Computational Efficiency

To account for the typically sparse nature of social networks, the paper integrates an l1 penalty to the log-likelihood to promote sparsity in the inferred adjacency matrix A. This addition not only aligns with real-world networks, where nodes generally connect with a limited subset of the network, but also enhances the method's focus on accurately predicting the presence of network edges. Recognizing the inherent computational complexity, the algorithm is partitioned into smaller, independent subproblems corresponding to each node's incoming edges, which significantly reduces the solution space and computational load.

Empirical Evaluation

Experiments conducted on both synthetic networks such as scale-free and Erdős–Rényi graphs, and real-world datasets, including a collaboration network and an email communication network, reveal the efficacy of the proposed ConNIe (Convex Network Inference) model. The method consistently achieves high precision-recall break-even points, particularly outperforming existing algorithms like NetInf, especially in estimating heterogeneous edge weights.

Additionally, the robustness of ConNIe is demonstrated across different transmission models and in scenarios with varying levels of noise in the infection data. These results underscore the model's adaptability and robustness to real-world complexities and data imperfections.

Implications and Future Directions

The implications of this research are profound, offering a scalable and mathematically sound framework for network inference from incomplete data—a common challenge in fields ranging from epidemiology to marketing. The ability to infer network architecture and dynamic processes opens avenues for enhanced understanding of information flow, influence dynamics, and the roles played by nodes within a network.

Future research directions could include refining the model to infer diffusion parameters concurrently with network structure, potentially leveraging unsupervised learning techniques to tailor the w(t) distribution to observed data. Additionally, applying the model to diverse domains, such as tracking information dissemination in online social networks or modeling control strategies for disease outbreaks, could yield further insights and enhancements in understanding complex networks.

The research provides a foundational method that pushes the boundaries of latent network inference, offering a template for future exploration into more dynamic and heterogeneous social environments.