Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SteinGen: Generating Fidelitous and Diverse Graph Samples (2403.18578v2)

Published 27 Mar 2024 in stat.ML and cs.LG

Abstract: Generating graphs that preserve characteristic structures while promoting sample diversity can be challenging, especially when the number of graph observations is small. Here, we tackle the problem of graph generation from only one observed graph. The classical approach of graph generation from parametric models relies on the estimation of parameters, which can be inconsistent or expensive to compute due to intractable normalisation constants. Generative modelling based on machine learning techniques to generate high-quality graph samples avoids parameter estimation but usually requires abundant training samples. Our proposed generating procedure, SteinGen, which is phrased in the setting of graphs as realisations of exponential random graph models, combines ideas from Stein's method and MCMC by employing Markovian dynamics which are based on a Stein operator for the target model. SteinGen uses the Glauber dynamics associated with an estimated Stein operator to generate a sample, and re-estimates the Stein operator from the sample after every sampling step. We show that on a class of exponential random graph models this novel "estimation and re-estimation" generation strategy yields high distributional similarity (high fidelity) to the original data, combined with high sample diversity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 670–688. IEEE, 2015.
  2. Alignment-free protein interaction network comparison. Bioinformatics, 30(17):i430–i437, 2014.
  3. Learning with blocks: Composite likelihood and contrastive divergence. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 33–40. JMLR Workshop and Conference Proceedings, 2010.
  4. Andrew D Barbour. Stein’s method for diffusion approximations. Probability Theory and Related Fields, 84(3):297–322, 1990.
  5. Julian Besag. Statistical analysis of non-lattice data. Journal of the Royal Statistical Society: Series D (The Statistician), 24(3):179–195, 1975.
  6. Mixing time of exponential random graphs. The Annals of Applied Probability, 21(6):2146–2170, 2011.
  7. NetGAN: Generating graphs via random walks. In International Conference on Machine Learning, pages 610–619. PMLR, 2018.
  8. The phase transition in inhomogeneous random graphs. Random Structures & Algorithms, 31(1):3–122, 2007.
  9. Shortest-path kernels on graphs. In Fifth IEEE International Conference on Data Mining (ICDM’05), pages 8–pp. IEEE, 2005.
  10. Learning graphical models from the Glauber dynamics. IEEE Transactions on Information Theory, 64(6):4072–4080, 2017.
  11. Carter T Butts. Social network analysis with sna. Journal of Statistical Software, 24:1–51, 2008.
  12. Machine learning on graphs: A model and comprehensive taxonomy. Journal of Machine Learning Research, 23(89):1–64, 2022.
  13. On the power of edge independent graph models. Advances in Neural Information Processing Systems, 34:24418–24429, 2021.
  14. Estimating and understanding exponential random graph models. The Annals of Statistics, 41(5):2428–2461, 2013.
  15. A kernel test of goodness of fit. In International Conference on Machine Learning, pages 2606–2615. PMLR, 2016.
  16. CC Craig. On the mean and variance of the smaller of two drawings from a binomial population. Biometrika, 49(3/4):566–569, 1962.
  17. Survey on synthetic data generation, evaluation methods and GANs. Mathematics, 10(15):2733, 2022.
  18. Multivariate normal approximations by Stein’s method and size bias couplings. Journal of Applied Probability, 33(1):1–17, 1996.
  19. Measuring sample quality with Stein’s method. In Advances in Neural Information Processing Systems, pages 226–234, 2015.
  20. GraphGen: a scalable approach to domain-agnostic labeled graph generation. In Proceedings of The Web Conference 2020, pages 1253–1263, 2020.
  21. A systematic survey on deep generative models for graph generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  22. Generative subgraph contrast for self-supervised graph representation learning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXX, pages 91–107. Springer, 2022.
  23. statnet: Software tools for the representation, visualization, analysis and simulation of network data. Journal of Statistical Software, 24(1):1548, 2008.
  24. Comprehensive analyses of intraviral Epstein-Barr virus protein–protein interactions hint central role of BLRF2 in the tegument network. Journal of Virology, 96(14):e00518–22, 2022.
  25. Graph laplacians and their convergence on random neighborhood graphs. Journal of Machine Learning Research, 8(6), 2007.
  26. Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1771–1800, 2002.
  27. Latent space approaches to social network analysis. Journal of the American Statistical association, 97(460):1090–1098, 2002.
  28. An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76(373):33–50, 1981.
  29. Inference in curved exponential family models for networks. Journal of Computational and Graphical Statistics, 15(3):565–583, 2006.
  30. Goodness of fit of social network models. Journal of the American Statistical Association, 103(481):248–258, 2008a.
  31. ergm: A package to fit, simulate and diagnose exponential-family models for networks. Journal of Statistical Software, 24(3):nihpa54860, 2008b.
  32. Convergence of contrastive divergence algorithm in exponential family. The Annals of Statistics, 46(6A):3067–3098, 2018.
  33. DERGMs: Degeneracy-restricted exponential random graph models. arXiv preprint arXiv:1612.03054, 2016.
  34. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  35. Eric D Kolaczyk. Statistical analysis of network data. Springer Series in Statistics, 2009.
  36. Advanced features of the ergm package for modeling networks. network, 1:2, 2022.
  37. ergm 4: New features for analyzing exponential-family random graph models. Journal of Statistical Software, 105(6):1–44, 2023. doi: 10.18637/jss.v105.i06.
  38. A kernelized Stein discrepancy for goodness-of-fit tests. In International Conference on Machine Learning, pages 276–284, 2016.
  39. Can GAN learn topological features of a graph? arXiv preprint arXiv:1707.06197, 2017.
  40. Limits of dense graph sequences. Journal of Combinatorial Theory, Series B, 96(6):933–957, 2006.
  41. Statistics of the two star ERGM. Bernoulli, 29(1):24–51, 2023.
  42. Mark Newman. Networks. Oxford University Press, 2. edition, 2018.
  43. Permutation invariant graph generation via score-based generative modeling. In International Conference on Artificial Intelligence and Statistics, pages 4474–4484. PMLR, 2020.
  44. Robust Action and the Rise of the Medici, 1400-1434. American Journal of Sociology, 98(6):1259–1319, 1993.
  45. Approximating stationary distributions of fast mixing Glauber dynamics, with applications to exponential random graphs. The Annals of Applied Probability, 29(5):3201–3229, 2019.
  46. NetGAN without GAN: From random walks to low-rank approximations. In Proceedings of the 37th International Conference on Machine Learning, pages 8073–8082. PMLR, 2020.
  47. Exponential random graph models with big networks: Maximum pseudolikelihood estimation and the parametric bootstrap. In 2017 IEEE International Conference on Big Data, pages 116–121. IEEE, 2017.
  48. Consistency under sampling of exponential random graph models. Annals of Statistics, 41(2):508, 2013.
  49. Weisfeiler-Lehman graph kernels. Journal of Machine Learning Research, 12(Sep):2539–2561, 2011.
  50. Molecular networks in network medicine: Development and applications. Wiley Interdisciplinary Reviews: Systems Biology and Medicine, 12(6):e1489, 2020.
  51. GraphVAE: Towards generation of small graphs using variational autoencoders. In Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part I 27, pages 412–422. Springer, 2018.
  52. Tom AB Snijders. Markov chain Monte Carlo estimation of exponential random graph models. Journal of Social Structure, 3(2):1–40, 2002.
  53. Spario YT Soon. Binomial approximation for dependent indicators. Statistica Sinica, pages 703–714, 1996.
  54. A review of latent space models for social networks. Revista Colombiana de Estadística, 44(1):171–200, 2021.
  55. Applying SIENA. Methodology, 2(1):48–56, 2006.
  56. Pseudolikelihood estimation for social networks. Journal of the American Statistical Association, 85(409):204–212, 1990.
  57. Halting in random walk kernels. In Advances in Neural Information Processing Systems, pages 1639–1647, 2015.
  58. Digress: Discrete denoising diffusion for graph generation. arXiv preprint arXiv:2209.14734, 2022.
  59. Comparative assessment of large-scale data sets of protein–protein interactions. Nature, 417(6887):399–403, 2002.
  60. Social Network Analysis: Methods and Applications, volume 8. Cambridge University Press, 1994.
  61. On RKHS choices for assessing graph generators via kernel Stein statistics. arXiv preprint arXiv:2210.05746, 2022.
  62. A Stein goodness-of-test for exponential random graph models. In International Conference on Artificial Intelligence and Statistics, pages 415–423. PMLR, 2021.
  63. AgraSSt: Approximate graph Stein statistics for interpretable assessment of implicit graph generators; version 4. arXiv preprint arXiv:2203.03673, 2022a.
  64. AgraSSt: Approximate graph Stein statistics for interpretable assessment of implicit graph generators. Advances in Neural Information Processing Systems, 35:24268–24279, 2022b.
  65. Goodness-of-fit testing for discrete distributions via Stein discrepancy. In International Conference on Machine Learning, pages 5557–5566, 2018.
  66. Asymptotic quantization of exponential random graphs. The Annals of Applied Probability, pages 3251–3285, 2016.
  67. GraphRNN: Generating realistic graphs with deep auto-regressive models. In International Conference on Machine Learning, pages 5708–5717. PMLR, 2018.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com