Encoder Embedding for General Graph and Node Classification (2405.15473v2)
Abstract: Graph encoder embedding, a recent technique for graph data, offers speed and scalability in producing vertex-level representations from binary graphs. In this paper, we extend the applicability of this method to a general graph model, which includes weighted graphs, distance matrices, and kernel matrices. We prove that the encoder embedding satisfies the law of large numbers and the central limit theorem on a per-observation basis. Under certain condition, it achieves asymptotic normality on a per-class basis, enabling optimal classification through discriminant analysis. These theoretical findings are validated through a series of experiments involving weighted graphs, as well as text and image data transformed into general graph representations using appropriate distance metrics.
- Graph based anomaly detection and description: A survey. Data Mining and Knowledge Discovery, 29(3):626–688, 2015.
- Statistical inference on random dot product graphs: a survey. Journal of Machine Learning Research, 18(226):1–92, 2018.
- Network biology: Understanding the cell’s functional organization. Nature Reviews Genetics, 5(2):101–113, 2004.
- Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.
- Complex networks: Structure and dynamics. Physics Reports, 424(4-5):175–308, 2006.
- Learning a spatially smooth subspace for face recognition. In Proceedings of IEEE Conference Computer Vision and Pattern Recognition Machine Learning (CVPR’07), 2007.
- Optimization techniques for semi-supervised support vector machines. The Journal of Machine Learning Research, 9:203–233, 2008.
- R. Cole and M. Fanty. Spoken letter recognition. In Proc. Third DARPA Speech and Natural Language Workshop, 1990.
- A Probabilistic Theory of Pattern Recognition. Springer, 1996.
- From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):643–660, 2001.
- Citeseer: An automatic citation indexing system. In Proceedings of the Third ACM Conference on Digital Libraries, pages 89–98, 1998.
- M. Girvan and M. E. J. Newman. Community structure in social and biological networks. Proceedings of National Academy of Science, 99(12):7821–7826, 2002.
- A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855–864, 2016.
- Face recognition using Laplacianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3):328–340, 2005.
- Stochastic blockmodels: First steps. Social Networks, 5(2):109–137, 1983.
- B. Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks. Physical Review E, 83:016107, 2011.
- M. G. Kendall. Rank Correlation Methods. London: Griffin, 1970.
- T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017.
- Acquiring linear subspaces for face recognition under variable lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5):684–698, 2005.
- R. Liu and A. Krishnan. Pecanpy: a fast, efficient and parallelized python implementation of node2vec. Bioinformatics, 37(19):3377–3379, 2021.
- Automating the construction of internet portals with machine learning. Information Retrieval, 3:127–163, 2000.
- M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45(2):167–256, 2003.
- Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014.
- On a ’two truths’ phenomenon in spectral graph clustering. Proceedings of the National Academy of Sciences, 116(13):5995–5600, 2019.
- Anomaly detection in dynamic networks: a survey. Wiley Interdisciplinary Reviews: Computational Statistics, 7(3):223–247, 2015.
- Spectral clustering and the high-dimensional stochastic blockmodel. Annals of Statistics, 39(4):1878–1915, 2011.
- C. Shen and J. T. Vogelstein. The exact equivalence of distance and kernel methods in hypothesis testing. AStA Advances in Statistical Analysis, 105(3):385–403, 2021.
- Generalized canonical correlation analysis for classification. Journal of Multivariate Analysis, 130:310–322, 2014.
- Manifold matching using shortest-path distance and joint neighborhood selection. Pattern Recognition Letters, 92:41–48, 2017.
- Graph encoder ensemble for simultaneous vertex embedding and community detection. In 2023 2nd International Conference on Algorithms, Data Mining, and Information Technology. ACM, 2023a.
- Synergistic graph fusion via encoder embedding. https://arxiv.org/abs/2303.18051, 2023b.
- One-hot graph encoder embedding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6):7933 – 7938, 2023c.
- Refined graph encoder embedding via self-training and latent community recovery. https://arxiv.org/abs/2405.12797, 2024a.
- Discovering communication pattern shifts in large-scale labeled networks using encoder embedding and vertex dynamics. IEEE Transactions on Network Science and Engineering, 11(2):2100 – 2109, 2024b.
- The CMU pose, illumination, and expression database. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12):1615–1618, 2003.
- T. Snijders and K. Nowicki. Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14(1):75–100, 1997.
- A consistent adjacency spectral embedding for stochastic blockmodel graphs. Journal of the American Statistical Association, 107(499):1119–1128, 2012.
- The anatomy of the facebook social graph. arXiv preprint arXiv:1111.4503, 2011.
- Structural properties of the caenorhabditis elegans neuronal network. PLoS Computational Biology, 7(2):e1001066, 2011.
- A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 32:4–24, 2019.
- S. Young and E. Scheinerman. Random dot product graph models for social networks. In Algorithms and Models for the Web-Graph, pages 138–149. Springer Berlin Heidelberg, 2007.
- Consistency of community detection in networks under degree-corrected stochastic block models. Annals of Statistics, 40(4):2266–2292, 2012.