AutoSGNN: Automated Propagation Discovery
- The paper demonstrates a hybrid LLM and evolutionary NAS approach that automates spectral GNN propagation design, yielding top validation accuracy across benchmark datasets.
- AutoSGNN unifies propagation discovery for both homophilic and heterophilic graphs by integrating diverse spectral filters and aggregation rules into a modular search space.
- The framework optimizes candidate architectures using a fitness function based on validation accuracy, achieving competitive efficiency compared to existing NAS methods.
Automatic Propagation Discovery (AutoSGNN) is a neural architecture search (NAS) framework for spectral graph neural networks (GNNs) that automates the design of propagation mechanisms. The approach targets both homogeneous and heterogeneous graph structures, with particular emphasis on unifying the discovery of propagation forms adaptable to varying homophily levels. AutoSGNN jointly leverages a LLM for generative architecture proposal and evolutionary strategies (ES) for iterative model selection, achieving state-of-the-art accuracy and efficiency across a spectrum of graph learning benchmarks (Mo et al., 2024).
1. Search Space Formalization for Spectral Propagation
AutoSGNN defines a unified, modular search space for spectral GNNs by abstracting most published spectral GNNs into the following universal layer-wise template (Eq. 3 of (Mo et al., 2024)):
Here, the architecture space comprises:
- Feature-Fitting Terms: Weighted or residual connections, e.g., (weighted self-feature), .
- Spectral Propagation Operators: Polynomial spectral filters (), Chebyshev/Bernstein polynomial filters, adjacency powers (), thresholded adjacency matrices, and attention-weighted adjacency.
- Aggregation/Combination Rules: Summation, concatenation, residual addition, gated attention across layers.
The filter , parametrized by , translates into spatial propagation , or in renormalized form as . Each filter and aggregation operator forms a discrete or continuous NAS variable within the search grammar.
2. LLM-Driven Evolutionary Architecture Generation
AutoSGNN’s outer search loop builds upon a hybrid LLM-ES paradigm, with key algorithmic steps:
- Representation: Each candidate GNN is encoded as (1) a Python class implementing the unified form, and (2) a human-readable "design-idea" description.
- Prompt Types:
- Mutation (E1): Requesting the LLM for a new spectral filter substantially distinct from elite (top-performing) candidates.
- Crossover (E2): Prompting the LLM to hybridize features from multiple elite architectures.
- Preference (C1): Having the LLM analyze the strengths/weaknesses of high-vs-low scoring designs, then propose improvements.
- Search Loop: At each generation, new candidates are LLM-generated, trained, and evaluated in parallel. Fitness is assigned by validation accuracy, and the top candidates update the elite set.
The process repeats for generations; the globally best validation performer forms the final model (Mo et al., 2024).
3. Fitness Function and Optimization Criteria
AutoSGNN adopts pure validation-set accuracy for candidate ranking:
Optionally, efficiency-aware variants subtract a runtime penalty:
All architectures taking longer than a pre-set timeout (e.g., 600s) are removed from the population ().
4. Adaptation across Homophilic and Heterophilic Regimes
The search grammar is inherently capable of spanning both homophilic (neighboring nodes share class) and heterophilic (neighbors differ in class) propagation patterns by exposing both low-pass and high-pass spectral mechanisms:
- Homophilic graphs: AutoSGNN’s search naturally gravitates to low-pass filters using powers of the renormalized adjacency (), with dominant coefficients favoring smooth signal propagation.
- Heterophilic graphs: The system’s prompt structure and grammar allow frequent emergence of thresholded adjacencies and residual feature-injection, enabling the network to focus on strong, anomalous connections or directly propagate node features.
The preference prompts in the LLM-ES pipeline empirically guide spectral filter proposal, matching estimated homophily ratios in the graph data (Mo et al., 2024).
5. Experimental Protocol and Comparative Results
Extensive experimentation substantiates the effectiveness of AutoSGNN on nine benchmark node-classification datasets, spanning both homophilic (Cora, Citeseer, PubMed, Amazon Computers/Photo) and heterophilic (Chameleon, Squirrel, Texas, Cornell) regimes. Protocol specifics:
- Metric: Node-classification accuracy under 2.5%/2.5%/95% split (train/val/test).
- Comparisons: State-of-the-art spectral GNNs (APPNP, GPRGNN, FAGCN, BernNet, JacobiConv, NFGNN, GCN, ChebNet) and NAS-based methods (GTFGNAS, F2GNN, Genetic-GNN, SANE).
- Results: AutoSGNN achieved the top Wilcoxon–Holm rank () on 7/9 datasets. For example, on Cora:
- APPNP:
- GCN:
- AutoSGNN: (mean std, 2.5% split)
- Efficiency: Each run involves candidates per generation, generations (360 total). On PubMed, average full search is $176$ minutes, with LLM inference accounting for $107$ minutes. Compared to SANE/F2GNN (differentiable NAS, min), and evolutionary NAS (GTFGNAS/Genetic-GNN, $250$–$300$ min), AutoSGNN exhibits competitive wall-clock efficiency (Mo et al., 2024).
6. Broader Methodological Significance
AutoSGNN’s unification of LLM-driven generative architecture proposals with evolutionary population refinement constitutes a hybrid NAS paradigm for GNNs. The success of AutoSGNN demonstrates that this approach can:
- Generalize across spectrum from homophilic to heterophilic graphs without requiring human-curated filter design.
- Search a broader architectural space than gradient-based (differentiable) NAS frameworks, due to non-reliance on continuous relaxations.
- Incorporate input data characteristics (e.g., homophily statistics) implicitly into the search via preference-guided LLM prompting.
A plausible implication is that similar LLM+ES joint pipelines may accelerate NAS research in other domains where expert-crafted design grammars are insufficient for handling structural diversity, particularly for non-Euclidean or relational data.
7. Relation to Heterogeneous Network NAS Approaches
While AutoSGNN operates primarily in the spectral design domain and is agnostic to node/edge types, related research on NAS for heterogeneous information networks (e.g., AutoGNR (Li et al., 10 Jan 2025)) addresses type-aware propagation path discovery by non-recursive message passing. Such frameworks search over explicit node-type subset selections at each hop, using bi-level optimization over both model and architecture parameters, and demonstrate direct performance gains from suppressing uncorrelated type aggregations. A distinction thus emerges: AutoSGNN abstracts propagation in spectral space and supports type-agnostic adaptation, while frameworks like AutoGNR operate over explicit type paths and non-recursive aggregation, both advancing principled automated discovery of effective graph propagation mechanisms for complex domains.