FloydNet: DP-Style Global Graph Learning
- FloydNet is a graph learning architecture that uses global DP-style iterative refinement to capture complex combinatorial and relational reasoning.
- It employs a dense all-pairs relationship tensor with a learnable operator inspired by the Floyd-Warshall algorithm to achieve high-order expressive power.
- Empirical results show state-of-the-art performance on benchmarks including CLRS-30, TSP, BREC, and molecule property prediction.
FloydNet is a graph learning architecture that realizes global, dynamic programming (DP)-style iterative refinement for combinatorial, algorithmic, and relational reasoning tasks. It departs from message-passing graph neural networks (MPNNs) by operating on a dense, all-pairs relationship representation, using a learnable operator inspired by the Floyd-Warshall algorithm. FloydNet attains higher-order expressive power, precisely implements generalized k-Folklore Weisfeiler-Lehman (k-FWL) color refinement, and achieves state-of-the-art empirical performance across a suite of challenging benchmarks, including CLRS-30, BREC, Traveling Salesman Problem (TSP), and molecule property prediction (Yu et al., 27 Jan 2026).
1. Global All-Pairs Representation
At the core of FloydNet is a dense, global "relationship tensor" maintained at every layer :
Here, is the number of graph nodes and is the hidden dimension. Each entry encodes the current embedding of the relationship between nodes and . The initialization aggregates node features , edge features , and global features using a multi-layer perceptron:
Such a design enables direct modeling of long-range and high-order dependencies, in contrast to local aggregation schemes in standard GNNs.
2. FloydBlock: Learned DP-Style Refinement
Each FloydNet layer, termed a "FloydBlock," performs a global update of the relationship tensor using a learnable analogue of the Floyd-Warshall update:
In FloydNet, the scalar min and addition are replaced with high-dimensional attention-based operators ("Pivotal Attention"). Given , normalized as :
- Pairwise queries , keys and , and values are linearly projected for each path .
- Keys and values along two-hop paths are combined (element-wise addition, default).
- Scaled dot-product attention over all pivot nodes computes:
The full FloydBlock updates via residual connections and a feed-forward network:
This global pattern can be interpreted as learning a task-specific relational calculus and enables long-range reasoning in layers for exponential receptive field growth.
3. Expressive Power and Theoretical Properties
FloydNet with directly implements the 2-Folklore WL (2-FWL) color refinement, which is equivalent in expressive power to the 3-WL test. Each pairwise embedding is refined by attending over the multiset of all length-2 paths , matching the combinatorial update rule of 3-WL (Yu et al., 27 Jan 2026).
In general, #1-FloydNet can update each -tuple via attention over all pivots, effectively realizing the k-Folklore WL test. This positions FloydNet firmly within the k-FWL hierarchy and endows the architecture with expressive power strictly beyond 1-WL MPNNs and previously established message-passing GNNs. The achievable expressive power can thus be systematically increased via the tuple size, providing a theoretically principled mechanism for higher-order reasoning.
4. Model Architecture and Implementation
FloydNet is structured as a stack of FloydBlocks, each utilizing the Pre-LN Transformer pattern:
- PivotalAttention: multi-head, head dimension .
- Feed-forward network (FFN): with GeLU activation.
- Normalization: LayerNorm by default; BatchNorm, RMSNorm, and QK-Norm are also supported.
- Combine operator : additive (default), multiplicative (for geometric tasks).
- CUDA kernel: optimized to reduce memory from to .
A prototypical iterative refinement pseudocode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
for l in 1..L: Rm1 = LayerNorm(R^(l-1)) Q = W^Q · Rm1 # [N,N,h,d_h] K1 = W^K · Rm1 V1 = W^V · Rm1 for each head h: for i in 1..N, k in 1..N: q = Q[i,k,h,:] for j in 1..N: k_path[j] = C( K1[i,j,h,:], K1[j,k,h,:] ) v_path[j] = C( V1[i,j,h,:], V1[j,k,h,:] ) a_j = q ⋅ k_path[j] / sqrt(d_h) w_j = softmax_j(a_j) out[h] = sum_j w_j * v_path[j] O[i,k,h,:] = out[h] R' = R^(l-1) + concat_heads(O) R^(l) = R' + FFN(LayerNorm(R')) return R^(L) |
5. Training Regimes and Hyperparameters
FloydNet is trained under domain-specific configurations:
- BREC (graph isomorphism): layers, , single head, BatchNorm, FFN removed, float64, AdamW with , batch size 64, no positional encodings.
- CLRS-30 (algorithmic reasoning): up to , , 6 heads, AdamW with , linear warmup and cosine decay, up to 80k steps, tested OOD up to .
- TSP (combinatorial optimization): , 6 heads, , DDPM formulation (binary cross-entropy on edges), 400 epochs × 100 steps, 64 GPUs, batch size 1 (with accumulation), trained on , tested on , optimality filtered via Concorde.
6. Empirical Evaluation
FloydNet establishes strong or state-of-the-art empirical results across domains:
- Homomorphism Counting: Near-zero mean absolute error on all 8 tasks, surpassing GIN, Subgraph-GNN, 2-GNN, 2-FGNN (all 2-WL).
- BREC: FloydNet (2-FWL/3-WL) accuracy 67.5% vs. 1-WL GNNs <, Graphormer 19.8%, PPGT 58.5%, KP-GNN 68.8%; #1-FloydNet achieves 95.0%, #1-FloydNet 99.8%, matching 4-WL.
- CLRS-30: Aggregated test accuracy:
| Class | Triplet-GMPNN | RANR | G-ForgetNet | RT | ET | FloydNet |
|---|---|---|---|---|---|---|
| Sort (4) | 75.6 | 94.2 | 78.1 | 50.0 | 82.3 | 100.0 |
| Search (3) | 58.8 | 82.9 | 63.8 | 65.3 | 63.0 | 91.6 |
| Greedy (2) | 76.4 | 83.5 | 91.8 | 85.3 | 81.7 | 93.2 |
| DP (3) | 82.0 | 42.7 | 86.7 | 83.2 | 83.5 | 90.0 |
| Graph (12) | 86.4 | 74.2 | 88.8 | 65.3 | 86.1 | 98.6 |
| String (2) | 49.1 | 49.1 | 54.7 | 32.5 | 54.8 | 99.7 |
| Geometry (3) | 94.1 | 88.4 | 95.1 | 84.6 | 88.2 | 99.7 |
| Total (30) | 80.0 | 75.8 | 82.9 | 66.2 | 80.1 | 96.6 |
- FloydNet maintains >90% accuracy up to in OOD settings (no-hint), while hint-augmented models often degrade or run out of memory.
- TSP (): Linkern heuristic optimality 38.8% (general), 16.1% (); FloydNet (10 samples) 99.8% overall, 99.4% large .
- LRGB/ZINC: Competitive or superior on selected molecule and vision tasks (e.g., ZINC-full MAE 0.016).
7. Comparison with Message Passing Paradigms and Future Directions
MPNNs propagate information solely along local graph edges, causing over-squashing and constraining expressivity to 1-WL (sometimes 2-WL with extensions). FloydNet, by contrast, leverages a global pairwise representation and DP-style refinement, resulting in exponentially fast receptive field growth ( paths in layers) and provable 3-WL (2-FWL) expressiveness.
The major tradeoff is an increased computational and memory cost: FloydNet’s standard implementation incurs cubic complexity . This is mitigated for practical graph sizes () with optimized kernels. Its strengths include exact emulation of dynamic programming routines (e.g., Floyd–Warshall), and superior suitability for tasks requiring long-range or combinatorial reasoning.
Current limitations include the cubic scaling and tailoring to moderate-size dense graphs. Open research areas include sparse pivot selection, approximate DP refinement, and multimodal extensions. FloydNet establishes learned DP refinement as a high-expressivity, empirically strong, and theoretically principled alternative to local message passing for global graph reasoning tasks (Yu et al., 27 Jan 2026).