Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Fair Comparison of Graph Neural Networks for Graph Classification (1912.09893v3)

Published 20 Dec 2019 in cs.LG and stat.ML

Abstract: Experimental reproducibility and replicability are critical topics in machine learning. Authors have often raised concerns about their lack in scientific publications to improve the quality of the field. Recently, the graph representation learning field has attracted the attention of a wide research community, which resulted in a large stream of works. As such, several Graph Neural Network models have been developed to effectively tackle graph classification. However, experimental procedures often lack rigorousness and are hardly reproducible. Motivated by this, we provide an overview of common practices that should be avoided to fairly compare with the state of the art. To counter this troubling trend, we ran more than 47000 experiments in a controlled and uniform framework to re-evaluate five popular models across nine common benchmarks. Moreover, by comparing GNNs with structure-agnostic baselines we provide convincing evidence that, on some datasets, structural information has not been exploited yet. We believe that this work can contribute to the development of the graph learning field, by providing a much needed grounding for rigorous evaluations of graph classification models.

Citations (420)

Summary

  • The paper presents a large-scale benchmarking framework with over 47,000 experiments to expose reproducibility issues in GNN evaluations.
  • The study demonstrates that several state-of-the-art GNNs struggle to exploit graph structures, often performing on par with structure-agnostic baselines.
  • The research highlights that incorporating degree information notably enhances performance on social network datasets, suggesting untapped potential in current architectures.

An Evaluation of Graph Neural Networks for Graph Classification

The research paper "A Fair Comparison of Graph Neural Networks for Graph Classification" presents a comprehensive empirical paper regarding the reproducibility and validity of experimental evaluations within the field of Graph Neural Networks (GNNs) for graph classification tasks. The authors engage in a large-scale benchmarking effort designed to provide a reliable comparative foundation for GNN research, addressing the prevalent issues of ambiguous experimental procedures and the lack of reproducibility that often impede the field.

This investigation centers on five prominent GNN architectures tested across nine commonly used benchmark datasets. Notably, the paper involves over 47,000 individual experiments, underscoring the rigorous nature of their analytic approach. An essential component of the work is the introduction of structure-agnostic baselines, particularly relevant in scenarios where GNN performances do not surpass these non-structural alternatives, raising critical questions about the necessity and utilization of graph structure within these models.

Key Findings

  1. Reproducibility Concerns in GNN Research: The paper highlights significant reproducibility issues in existing GNN papers, largely attributed to a lack of standardized data splits, inconsistencies in feature usage, and omissions in hyper-parameter tuning details. The authors' rigorous methodology, which includes predefined data splits and a standardized experimental framework, sets a new benchmark for future studies.
  2. Evaluation Protocols and Results: The research implements a stringent model evaluation approach using 10-fold cross-validation and an inner holdout method for model selection, ensuring a clear separation between model selection and assessment to eliminate biases. This systematic evaluation exposes substantial deviations in reported versus actual performances of GNNs, often showing that reported successes in literature may not generalize well under unified testing conditions.
  3. Exploiting Graph Structure: One striking outcome from the paper is the finding that several state-of-the-art GNNs do not effectively exploit graph structure on datasets such as D{content}D, PROTEINS, and ENZYMES, where structure-agnostic baselines performed equally well or surpassed GNN results. This observation suggests that the structural insights purportedly captured by GNNs on these datasets may be overstated.
  4. Degree Information Impact: For social network datasets, augmenting node features with degree information significantly enhanced performance across most tested models. This enhancement underlines a potential gap in how GNN architectures are currently leveraging structural data and suggests that degree information, a fundamental graph property, remains pivotal for achieving competitive performance.

Implications and Future Directions

The implications of this paper are notable for both practical applications and the theoretical development of GNNs. Practically, it underscores the need for more nuanced and robust methodologies when evaluating new GNN architectures, particularly the importance of relevant baselines to ascertain genuine improvements. Theoretically, the paper prompts reconsideration of current GNN designs and suggests avenues for enhancing their ability to harness graph structures effectively.

Future research could build upon this benchmark by exploring additional structural features and their implications on GNN performance, investigating adaptive mechanisms for dynamic feature incorporation, and developing new architectures that can surpass the identified benchmarks with genuine structural insights. As graph-based learning continues to evolve, this paper provides crucial insights and a methodological foundation for ensuring scientific rigor and progress in the field.

By offering a reproducible and comprehensive evaluation framework, this work not only identifies existing methodological shortcomings but also paves the way for more reliable technological advancements within the graph learning community.