G2LFormer: Global-to-Local Graph Learning
- G2LFormer is a graph learning architecture that integrates a global SGFormer layer with local GNN refinement to capture both long-range and local features.
- The design employs a dedicated cross-layer fusion module (NOSAF) to selectively merge global context with topology-aware information.
- Its efficient computation mitigates over-smoothing and over-globalization, achieving linear scalability on sparse graphs.
G2LFormer is a graph learning architecture characterized by a global-to-local attention mechanism, designed to address information loss in graph transformers (GTs) that integrate Graph Neural Networks (GNNs) with global attention layers. Unlike prior schemes that apply local aggregation before, after, or in tandem with global attention, G2LFormer reverses the direction: global information is provided first, followed by topology-aware local GNN refinement. This ordering, together with a dedicated cross-layer fusion strategy, enables effective learning of both long-range and neighborhood structure while preserving computational efficiency and mitigating over-smoothing and over-globalization (Wang et al., 18 Sep 2025).
1. Global-to-Local Network Architecture
G2LFormer processes node features and adjacency using the following structure:
- Global Layer: A single shallow global-attention layer, implemented as a variant of SGFormer, computes an embedding that captures long-range node interactions across the entire graph.
- Cross-Layer Fusion Module: The NOSAF (Node- and Output-Selective Accumulative Fusion) module propagates into a stack of local GNN layers, providing a globally informed initial context.
- Local Stack: Downstream, local GNN layers (Cluster-GCN for node-level tasks or GatedGCN for graph-level tasks) operate on top of the globally-aware representation. These layers emphasize structural aggregation over the graph topology, resulting in final embedding .
The data flow can be summarized as:
| Stage | Input | Output |
|---|---|---|
| Global Attention (SGFormer) | , | |
| Cross-Layer Fusion (NOSAF) | Fused 0 | |
| Local GNN Stack (Cluster/Gated) | Fused 1 | 2 |
Each local layer's computation is modulated by the fusion mechanism, allowing the retention and controlled propagation of global information.
2. Global-to-Local Attention Scheme
This attention scheme inverts common GT integration patterns. Instead of local-to-global or parallel local+global, G2LFormer applies all-pair global attention to node features before any message-passing occurs. Specifically:
- The single global layer leverages SGFormer with a linearized attention mechanism, ensuring that every node incorporates long-range context.
- The subsequent local GNN layers refine these representations using topology-aware aggregation. As a result, nodes do not start from purely local features, preventing over-smoothing effects typical in deep GNNs.
- The reverse ordering addresses previous deficiencies where local information was overwhelmed ("over-globalization") or where valuable global structure was lost due to stacking schemes.
This sequence ensures that long-range dependencies are available upfront, yet local neighborhoods remain influential during further propagation.
3. Cross-Layer Information Fusion: NOSAF
NOSAF introduces a selective fusion pathway between the global and local stacks, maintaining a dynamic running state (3) and performing node-wise adaptive blending at each local layer. The core steps per local layer 4 are:
- Summary Formation: Create 5, a concatenated summary of the accumulated state and current embeddings:
6
- Node-wise Gating: Compute gating vector 7 via:
8
- Feature Filtering: Apply the gate to local layer output:
9
- State Update: Accumulate the modulated features:
0
Here, Hadamard product and broadcast ensure node-wise adaptivity. The filtering effect ensures local embeddings retain beneficial global and prior local information, mitigating the risk of signal dilution or over-aggregation.
4. Principal Formulations
Global Layer (SGFormer):
- Projection:
1
- Normalization:
2
- Scaling factor:
3
- Output:
4
Local Layer (Generic GNN, Layer 5):
6
Layer Integration (Input per Local Layer):
7
5. Computational Complexity
The architecture is designed for linear efficiency at scale:
- The SGFormer-based global layer has 8 cost due to the linear kernelization of the all-pair attention computation.
- Each local GNN layer operates at 9, where 0 is the number of edges, which is 1 for sparse graphs.
- The NOSAF fusion introduces 2 cost per layer, which remains practical as 3.
- The overall cost is 4 for sparse graphs, matching the scalability of other linear-time GTs without incurring their characteristic over-globalization or over-smoothing (Wang et al., 18 Sep 2025).
6. Hyperparameters and Implementation
Key hyperparameters and design choices include:
| Parameter | Typical Value/Choice | Notes |
|---|---|---|
| Global layers (5) | 1 | Shallow single-head SGFormer |
| Local layers (6) | 2–4 | Dataset dependent |
| Feature dimension (7) | 128 or 256 | Input and hidden feature size |
| NOSAF dims (8) | 32, 16 | Control fusion gating capacity |
| FFN inner hidden | 9 | Feedforward network width |
| Attention heads | 1 | SGFormer restricts to a single head |
| Local GNNs | Cluster-GCN, GatedGCN | Respectively for node or graph tasks |
The forward computation follows:
0
A plausible implication is that the G2LFormer structure allows precise control over feature propagation depth and locality, which could be advantageous for graphs exhibiting heterogeneous mixing patterns.
7. Context and Significance
G2LFormer provides an architectural alternative to conventional GTs, addressing two primary challenges: dilution of local information by global attention, and the risk that wide-scale attention masks crucial neighborhood structure. By placing the global SGFormer layer upfront and equipping subsequent local GNN layers with a selective fusion mechanism, the architecture balances long-range expressivity with structural fidelity. This suggests a pathway toward scalable, expressive graph models that maintain both competitive accuracy and practical runtime, with applicability to both node-level and graph-level tasks (Wang et al., 18 Sep 2025). The empirical study underlying G2LFormer demonstrates that this design outperforms state-of-the-art linear GTs and GNNs in terms of performance and efficiency on benchmark tasks.