Bayesian inference on random simple graphs with power law degree distributions
(1702.08239v2)
Published 27 Feb 2017 in stat.ML
Abstract: We present a model for random simple graphs with a degree distribution that obeys a power law (i.e., is heavy-tailed). To attain this behavior, the edge probabilities in the graph are constructed from Bertoin-Fujita-Roynette-Yor (BFRY) random variables, which have been recently utilized in Bayesian statistics for the construction of power law models in several applications. Our construction readily extends to capture the structure of latent factors, similarly to stochastic blockmodels, while maintaining its power law degree distribution. The BFRY random variables are well approximated by gamma random variables in a variational Bayesian inference routine, which we apply to several network datasets for which power law degree distributions are a natural assumption. By learning the parameters of the BFRY distribution via probabilistic inference, we are able to automatically select the appropriate power law behavior from the data. In order to further scale our inference procedure, we adopt stochastic gradient ascent routines where the gradients are computed on minibatches (i.e., subsets) of the edges in the graph.
The paper introduces a Bayesian model that leverages truncated BFRY variables to naturally produce power-law degree distributions characterized by a tunable exponent.
It employs scalable variational Bayesian inference with stochastic gradient ascent and minibatch training to efficiently estimate latent node weights and hyperparameters.
Experiments on simulated and real-world networks demonstrate improved predictive performance over traditional Gamma-based models.
This paper introduces a Bayesian model for simple random graphs that naturally exhibits power-law degree distributions, a characteristic found in many real-world networks like social networks and biological interaction graphs. Unlike previous theoretical work that established conditions for scale-free graphs within frameworks like the generalized random graph (GRG) model, this paper provides a practical construction and a scalable inference method.
The core of the proposed model is the use of truncated Bertoin--Fujita--Roynette--Yor (BFRY) random variables to define node-specific weights, denoted as Wi for node i. In a GRG model, the probability of an edge between nodes i and j (pi,j) is determined by node parameters, often through the odds ratio ri,j=pi,j/(1−pi,j). This paper sets ri,j=UiUj, where Ui=Wi/L and L=∑kWk.
Key aspects of the model include:
Power-Law Generation: BFRY random variables possess heavy tails, which are essential for generating power-law degree distributions in the resulting graph. The density function of the truncated BFRY variables depends on a discount parameter α∈(0,1). The model proves that the graph's degree distribution follows a power law with exponent τ=1+α, allowing for exponents in the range (1, 2).
Truncation and Sparsity: The BFRY variables are upper truncated at a value Cn (dependent on the number of nodes n). This truncation ensures that the node weights have finite moments, a requirement for the specific GRG construction used, and allows for controlling the graph's sparsity. By setting Cn=nβ for some β>0, the expected number of edges scales as O(n1+β(1−α)), providing a mechanism to decouple power-law behavior (controlled by α) from density (controlled by β).
Latent Factor Extension: The model can be naturally extended to incorporate latent factors, similar to stochastic blockmodels. This is achieved by scaling the odds ratios, ri,j=Ai,jUiUj, where Ai,j depends on latent variables (e.g., cluster assignments). The paper shows that if Ai,j are uniformly bounded, this scaling does not affect the asymptotic power-law degree distribution. Examples include incorporating discrete cluster assignments or mixed memberships.
For statistical inference on model parameters (the latent weights Wi and hyperparameters α,β), the paper proposes a variational Bayesian approach.
Variational Inference: An approximate posterior distribution q(W;θ) over the latent weights W is optimized to maximize a lower bound on the marginal likelihood (the evidence lower bound, ELBO). A mean-field assumption is made, q(W;θ)=∏q(Wi;θi), where each marginal q(Wi;θi) is modeled as a rectified gamma distribution (a gamma distribution truncated at Cn).
Stochastic Gradient Ascent: The variational parameters θ are optimized using stochastic gradient ascent. To handle the expectation in the ELBO, a Monte Carlo approximation based on the reparameterization trick for gamma distributions is employed.
Minibatch Training: To scale to large networks, the gradient computations are performed on minibatches of edges, approximating the full gradient over all edges. This is a standard technique from stochastic optimization applied to the network setting.
Hyperparameter Inference: The discount parameter α, which dictates the power-law exponent, is inferred via gradient ascent by fixing the latent weights W to their mean under the current variational distribution and maximizing the likelihood with respect to α. This requires approximating a normalization constant and its derivative numerically. The sparsity parameter β was found difficult to infer directly and is instead selected using cross-validation in the experiments.
The experimental section demonstrates the practical utility of the model and inference procedure.
Inference of α on simulated data shows that the method can effectively recover the true α values, although with slight overestimation in some regimes.
Comparisons on simulated data confirm that the BFRY model, which captures the power-law structure, achieves better predictive performance (higher test log-likelihood) than a baseline GRG model using Gamma-distributed weights (which do not produce power-law tails).
Evaluation on several real-world network datasets (US airports, openflights, polblogs, Facebook) shows that the BFRY model consistently outperforms the Gamma baseline model. The inferred α values provide insights into the heavy-tailedness of the degree distributions in these networks, indicating, for example, that the Facebook network analyzed has heavier tails (α≈0.00) than the air traffic or political blog networks.
The paper highlights that controlling sparsity via β is important for predictive performance and suggests that future work could focus on better inference methods for β and implementing the latent factor extensions presented.
In summary, the paper provides a concrete, implementable Bayesian model for power-law simple graphs using truncated BFRY variables, develops a scalable variational inference algorithm leveraging stochastic gradient ascent on minibatches, and demonstrates its effectiveness on both synthetic and real-world network datasets. The model offers a way to statistically infer the power-law exponent from data and provides improved predictive performance compared to models lacking this structure.