Graph Data Condensation via Self-expressive Graph Structure Reconstruction (2403.07294v2)
Abstract: With the increasing demands of training graph neural networks (GNNs) on large-scale graphs, graph data condensation has emerged as a critical technique to relieve the storage and time costs during the training phase. It aims to condense the original large-scale graph to a much smaller synthetic graph while preserving the essential information necessary for efficiently training a downstream GNN. However, existing methods concentrate either on optimizing node features exclusively or endeavor to independently learn node features and the graph structure generator. They could not explicitly leverage the information of the original graph structure and failed to construct an interpretable graph structure for the synthetic dataset. To address these issues, we introduce a novel framework named \textbf{G}raph Data \textbf{C}ondensation via \textbf{S}elf-expressive Graph Structure \textbf{R}econstruction (\textbf{GCSR}). Our method stands out by (1) explicitly incorporating the original graph structure into the condensing process and (2) capturing the nuanced interdependencies between the condensed nodes by reconstructing an interpretable self-expressive graph structure. Extensive experiments and comprehensive analysis validate the efficacy of the proposed method across diverse GNN models and datasets. Our code is available at \url{https://github.com/zclzcl0223/GCSR}.
- Spectral sparsification of graphs: theory and algorithms. Commun. ACM 56, 8 (2013), 87–94.
- Graph coarsening with neural networks. arXiv preprint arXiv:2102.01350 (2021).
- Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4750–4759.
- A unified lottery ticket hypothesis for graph neural networks. In International conference on machine learning. PMLR, 1695–1706.
- Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 257–266.
- Scaling up dataset distillation to imagenet-1k with constant memory. arXiv preprint arXiv:2211.10586 (2022).
- Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems 29 (2016).
- Zhiwei Deng and Olga Russakovsky. 2022. Remember the past: Distilling datasets into addressable memories for neural networks. Advances in Neural Information Processing Systems 35 (2022), 34391–34404.
- Minimizing the accumulated trajectory error to improve dataset distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3749–3758.
- Graph neural tangent kernel: Fusing graph neural networks with graph kernels. Advances in neural information processing systems 32 (2019).
- Recommender systems for large-scale social networks: A review of challenges and solutions. , 413–418 pages.
- Ehsan Elhamifar and René Vidal. 2013. Sparse subspace clustering: Algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence 35, 11 (2013), 2765–2781.
- Graph neural networks for social recommendation. In The world wide web conference. 417–426.
- Reza Zanjirani Farahani and Masoud Hekmatfar. 2009. Facility location: concepts, models, algorithms and case studies. Springer Science & Business Media.
- Learning discrete structures for graph neural networks. In International conference on machine learning. PMLR, 1972–1982.
- Predict then propagate: Graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997 (2018).
- Neural message passing for quantum chemistry. In International conference on machine learning. PMLR, 1263–1272.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33 (2020), 21271–21284.
- Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).
- Graph neural networks with continual learning for fake news detection from social media. arXiv preprint arXiv:2007.03316 (2020).
- Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems 33 (2020), 22118–22133.
- Scaling up graph neural networks via graph coarsening. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 675–684.
- Condensing graphs via one-step gradient matching. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 720–730.
- Graph condensation for graph neural networks. arXiv preprint arXiv:2110.07580 (2021).
- Graph coarsening with preserved spectral properties. In International Conference on Artificial Intelligence and Statistics. PMLR, 4452–4462.
- Fine-grained attributed graph clustering. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM). SIAM, 370–378.
- Dataset condensation via efficient synthetic-data parameterization. In International Conference on Machine Learning. PMLR, 11102–11118.
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
- Rabindra Lamsal. 2021. Design and analysis of a large-scale COVID-19 tweets dataset. applied intelligence 51 (2021), 2790–2804.
- Dataset condensation with latent space knowledge factorization and sharing. arXiv preprint arXiv:2208.10494 (2022).
- Dataset condensation with contrastive signals. In International Conference on Machine Learning. PMLR, 12352–12364.
- Dataset distillation using parameter pruning. arXiv preprint arXiv:2209.14609 (2022).
- Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence 40, 12 (2017), 2935–2947.
- CBLab: Supporting the Training of Large-scale Traffic Control Policies with Scalable Traffic Simulation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4449–4460.
- Robust recovery of subspace structures by low-rank representation. IEEE transactions on pattern analysis and machine intelligence 35, 1 (2012), 171–184.
- Robust subspace segmentation by low-rank representation. In Proceedings of the 27th international conference on machine learning (ICML-10). 663–670.
- Dataset distillation via factorization. Advances in Neural Information Processing Systems 35 (2022), 1100–1113.
- Towards unsupervised deep graph structure learning. In Proceedings of the ACM Web Conference 2022. 1392–1403.
- Cross-city Few-Shot Traffic Forecasting via Traffic Pattern Bank. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 1451–1460.
- Dataset Distillation with Convexified Implicit Gradients. arXiv preprint arXiv:2302.06755 (2023).
- Robust and efficient subspace segmentation via least squares regression. In Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VII 12. Springer, 347–360.
- Pseudo-supervised deep subspace clustering. IEEE Transactions on Image Processing 30 (2021), 5252–5263.
- Is homophily a necessity for graph neural networks? arXiv preprint arXiv:2106.06134 (2021).
- A unified view on graph neural networks as graph signal denoising. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 1202–1211.
- Towards clustering-friendly representations: Subspace clustering via graph filtering. In Proceedings of the 28th ACM international conference on multimedia. 3081–3089.
- COVIDSenti: A large-scale benchmark Twitter data set for COVID-19 sentiment analysis. IEEE transactions on computational social systems 8, 4 (2021), 1003–1015.
- Survey of image based graph neural networks. arXiv preprint arXiv:2106.06307 (2021).
- Dataset meta-learning from kernel ridge-regression. arXiv preprint arXiv:2011.00050 (2020).
- Dataset distillation with infinitely wide convolutional networks. Advances in Neural Information Processing Systems 34 (2021), 5186–5198.
- Hoang Nt and Takanori Maehara. 2019. Revisiting graph neural networks: All we have is low-pass filters. arXiv preprint arXiv:1905.09550 (2019).
- Erlin Pan and Zhao Kang. 2021. Multi-view contrastive graph clustering. Advances in neural information processing systems 34 (2021), 2148–2159.
- A comprehensive survey of neural architecture search: Challenges and solutions. ACM Computing Surveys (CSUR) 54, 4 (2021), 1–34.
- Peter J Rousseeuw. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics 20 (1987), 53–65.
- Datadam: Efficient dataset distillation with attention matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 17097–17107.
- Ozan Sener and Silvio Savarese. 2017. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489 (2017).
- Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).
- Graph attention networks. stat 1050, 20 (2017), 10–48550.
- Cafe: Learning to condense dataset by aligning features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12196–12205.
- Acekg: A large-scale knowledge graph for academic data mining. In Proceedings of the 27th ACM international conference on information and knowledge management. 1487–1490.
- Dataset distillation. arXiv preprint arXiv:1811.10959 (2018).
- Max Welling. 2009. Herding dynamical weights to learn. In Proceedings of the 26th Annual International Conference on Machine Learning. 1121–1128.
- Mind: A large-scale dataset for news recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3597–3606.
- Simplifying graph convolutional networks. In International conference on machine learning. PMLR, 6861–6871.
- Graph neural networks for natural language processing: A survey. Foundations and Trends® in Machine Learning 16, 2 (2023), 119–328.
- Scaled simplex representation for subspace clustering. IEEE Transactions on Cybernetics 51, 3 (2019), 1493–1505.
- How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018).
- Kernel Ridge Regression-Based Graph Dataset Distillation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2850–2861.
- Does Graph Distillation See Like Vision Dataset Counterpart? arXiv preprint arXiv:2310.09192 (2023).
- Graph neural networks are inherently good generalizers: Insights by bridging gnns and mlps. arXiv preprint arXiv:2212.09034 (2022).
- Graphsaint: Graph sampling based inductive learning method. arXiv preprint arXiv:1907.04931 (2019).
- ASER: A large-scale eventuality knowledge graph. In Proceedings of the web conference 2020. 201–211.
- Accelerating dataset distillation via model augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11950–11959.
- Attributed graph clustering via adaptive graph convolution. arXiv preprint arXiv:1906.01210 (2019).
- Bo Zhao and Hakan Bilen. 2021. Dataset condensation with differentiable siamese augmentation. In International Conference on Machine Learning. PMLR, 12674–12685.
- Bo Zhao and Hakan Bilen. 2023. Dataset condensation with distribution matching. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 6514–6523.
- Dataset condensation with gradient matching. arXiv preprint arXiv:2006.05929 (2020).
- Structure-free Graph Condensation: From Large-scale Graphs to Condensed Graph-free Data. arXiv preprint arXiv:2306.02664 (2023).
- Dataset distillation using neural feature regression. Advances in Neural Information Processing Systems 35 (2022), 9813–9827.
- Interpreting and unifying graph neural networks with an optimization framework. In Proceedings of the Web Conference 2021. 1215–1226.