Papers
Topics
Authors
Recent
Search
2000 character limit reached

G2LFormer: Global-to-Local Graph Learning

Updated 27 April 2026
  • G2LFormer is a graph learning architecture that integrates a global SGFormer layer with local GNN refinement to capture both long-range and local features.
  • The design employs a dedicated cross-layer fusion module (NOSAF) to selectively merge global context with topology-aware information.
  • Its efficient computation mitigates over-smoothing and over-globalization, achieving linear scalability on sparse graphs.

G2LFormer is a graph learning architecture characterized by a global-to-local attention mechanism, designed to address information loss in graph transformers (GTs) that integrate Graph Neural Networks (GNNs) with global attention layers. Unlike prior schemes that apply local aggregation before, after, or in tandem with global attention, G2LFormer reverses the direction: global information is provided first, followed by topology-aware local GNN refinement. This ordering, together with a dedicated cross-layer fusion strategy, enables effective learning of both long-range and neighborhood structure while preserving computational efficiency and mitigating over-smoothing and over-globalization (Wang et al., 18 Sep 2025).

1. Global-to-Local Network Architecture

G2LFormer processes node features X∈RN×dX \in \mathbb{R}^{N \times d} and adjacency AA using the following structure:

  • Global Layer: A single shallow global-attention layer, implemented as a variant of SGFormer, computes an embedding hTLh_{TL} that captures long-range node interactions across the entire graph.
  • Cross-Layer Fusion Module: The NOSAF (Node- and Output-Selective Accumulative Fusion) module propagates hTLh_{TL} into a stack of local GNN layers, providing a globally informed initial context.
  • Local Stack: Downstream, nn local GNN layers (Cluster-GCN for node-level tasks or GatedGCN for graph-level tasks) operate on top of the globally-aware representation. These layers emphasize structural aggregation over the graph topology, resulting in final embedding hGLh_{GL}.

The data flow can be summarized as:

Stage Input Output
Global Attention (SGFormer) XX, AA hTLh_{TL}
Cross-Layer Fusion (NOSAF) hTLh_{TL} Fused AA0
Local GNN Stack (Cluster/Gated) Fused AA1 AA2

Each local layer's computation is modulated by the fusion mechanism, allowing the retention and controlled propagation of global information.

2. Global-to-Local Attention Scheme

This attention scheme inverts common GT integration patterns. Instead of local-to-global or parallel local+global, G2LFormer applies all-pair global attention to node features before any message-passing occurs. Specifically:

  • The single global layer leverages SGFormer with a linearized attention mechanism, ensuring that every node incorporates long-range context.
  • The subsequent local GNN layers refine these representations using topology-aware aggregation. As a result, nodes do not start from purely local features, preventing over-smoothing effects typical in deep GNNs.
  • The reverse ordering addresses previous deficiencies where local information was overwhelmed ("over-globalization") or where valuable global structure was lost due to stacking schemes.

This sequence ensures that long-range dependencies are available upfront, yet local neighborhoods remain influential during further propagation.

3. Cross-Layer Information Fusion: NOSAF

NOSAF introduces a selective fusion pathway between the global and local stacks, maintaining a dynamic running state (AA3) and performing node-wise adaptive blending at each local layer. The core steps per local layer AA4 are:

  • Summary Formation: Create AA5, a concatenated summary of the accumulated state and current embeddings:

AA6

  • Node-wise Gating: Compute gating vector AA7 via:

AA8

  • Feature Filtering: Apply the gate to local layer output:

AA9

  • State Update: Accumulate the modulated features:

hTLh_{TL}0

Here, Hadamard product and broadcast ensure node-wise adaptivity. The filtering effect ensures local embeddings retain beneficial global and prior local information, mitigating the risk of signal dilution or over-aggregation.

4. Principal Formulations

Global Layer (SGFormer):

  • Projection:

hTLh_{TL}1

  • Normalization:

hTLh_{TL}2

  • Scaling factor:

hTLh_{TL}3

  • Output:

hTLh_{TL}4

Local Layer (Generic GNN, Layer hTLh_{TL}5):

hTLh_{TL}6

Layer Integration (Input per Local Layer):

hTLh_{TL}7

5. Computational Complexity

The architecture is designed for linear efficiency at scale:

  • The SGFormer-based global layer has hTLh_{TL}8 cost due to the linear kernelization of the all-pair attention computation.
  • Each local GNN layer operates at hTLh_{TL}9, where hTLh_{TL}0 is the number of edges, which is hTLh_{TL}1 for sparse graphs.
  • The NOSAF fusion introduces hTLh_{TL}2 cost per layer, which remains practical as hTLh_{TL}3.
  • The overall cost is hTLh_{TL}4 for sparse graphs, matching the scalability of other linear-time GTs without incurring their characteristic over-globalization or over-smoothing (Wang et al., 18 Sep 2025).

6. Hyperparameters and Implementation

Key hyperparameters and design choices include:

Parameter Typical Value/Choice Notes
Global layers (hTLh_{TL}5) 1 Shallow single-head SGFormer
Local layers (hTLh_{TL}6) 2–4 Dataset dependent
Feature dimension (hTLh_{TL}7) 128 or 256 Input and hidden feature size
NOSAF dims (hTLh_{TL}8) 32, 16 Control fusion gating capacity
FFN inner hidden hTLh_{TL}9 Feedforward network width
Attention heads 1 SGFormer restricts to a single head
Local GNNs Cluster-GCN, GatedGCN Respectively for node or graph tasks

The forward computation follows:

nn0

A plausible implication is that the G2LFormer structure allows precise control over feature propagation depth and locality, which could be advantageous for graphs exhibiting heterogeneous mixing patterns.

7. Context and Significance

G2LFormer provides an architectural alternative to conventional GTs, addressing two primary challenges: dilution of local information by global attention, and the risk that wide-scale attention masks crucial neighborhood structure. By placing the global SGFormer layer upfront and equipping subsequent local GNN layers with a selective fusion mechanism, the architecture balances long-range expressivity with structural fidelity. This suggests a pathway toward scalable, expressive graph models that maintain both competitive accuracy and practical runtime, with applicability to both node-level and graph-level tasks (Wang et al., 18 Sep 2025). The empirical study underlying G2LFormer demonstrates that this design outperforms state-of-the-art linear GTs and GNNs in terms of performance and efficiency on benchmark tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to G2LFormer Architecture.