Papers
Topics
Authors
Recent
Search
2000 character limit reached

FloydNet: DP-Style Global Graph Learning

Updated 23 February 2026
  • FloydNet is a graph learning architecture that uses global DP-style iterative refinement to capture complex combinatorial and relational reasoning.
  • It employs a dense all-pairs relationship tensor with a learnable operator inspired by the Floyd-Warshall algorithm to achieve high-order expressive power.
  • Empirical results show state-of-the-art performance on benchmarks including CLRS-30, TSP, BREC, and molecule property prediction.

FloydNet is a graph learning architecture that realizes global, dynamic programming (DP)-style iterative refinement for combinatorial, algorithmic, and relational reasoning tasks. It departs from message-passing graph neural networks (MPNNs) by operating on a dense, all-pairs relationship representation, using a learnable operator inspired by the Floyd-Warshall algorithm. FloydNet attains higher-order expressive power, precisely implements generalized k-Folklore Weisfeiler-Lehman (k-FWL) color refinement, and achieves state-of-the-art empirical performance across a suite of challenging benchmarks, including CLRS-30, BREC, Traveling Salesman Problem (TSP), and molecule property prediction (Yu et al., 27 Jan 2026).

1. Global All-Pairs Representation

At the core of FloydNet is a dense, global "relationship tensor" maintained at every layer ll:

R(l)RN×N×dr\mathbf{R}^{(l)} \in \mathbb{R}^{N \times N \times d_r}

Here, NN is the number of graph nodes and drd_r is the hidden dimension. Each entry Ri,k(l)Rdr\mathbf{R}^{(l)}_{i, k} \in \mathbb{R}^{d_r} encodes the current embedding of the relationship between nodes ii and kk. The initialization R(0)\mathbf{R}^{(0)} aggregates node features XiRdn\mathbf{X}_i \in \mathbb{R}^{d_n}, edge features Ei,kRde\mathbf{E}_{i, k} \in \mathbb{R}^{d_e}, and global features GRdg\mathbf{G} \in \mathbb{R}^{d_g} using a multi-layer perceptron:

Ri,k(0)=MLPinit([G,Xi,Xk,Ei,k])\mathbf{R}^{(0)}_{i, k} = \mathrm{MLP}_{\text{init}}([\mathbf{G}, \mathbf{X}_i, \mathbf{X}_k, \mathbf{E}_{i, k}])

Such a design enables direct modeling of long-range and high-order dependencies, in contrast to local aggregation schemes in standard GNNs.

2. FloydBlock: Learned DP-Style Refinement

Each FloydNet layer, termed a "FloydBlock," performs a global update of the relationship tensor using a learnable analogue of the Floyd-Warshall update:

Dikminj(Dij+Djk)D_{ik} \leftarrow \min_j (D_{ij} + D_{jk})

In FloydNet, the scalar min and addition are replaced with high-dimensional attention-based operators ("Pivotal Attention"). Given R(l1)\mathbf{R}^{(l-1)}, normalized as R=Norm(R(l1))\mathbf{R}' = \mathrm{Norm}(\mathbf{R}^{(l-1)}):

  • Pairwise queries qi,k(l1)\mathbf{q}^{(l-1)}_{i,k}, keys ki,j(l1)\mathbf{k}^{(l-1)}_{i,j} and kj,k(l1)\mathbf{k}^{(l-1)}_{j,k}, and values are linearly projected for each path ijki \rightarrow j \rightarrow k.
  • Keys and values along two-hop paths are combined (element-wise addition, default).
  • Scaled dot-product attention over all pivot nodes jj computes:

oi,k=j=1Nsoftmaxj(qi,k,kijkdh)vijk\mathbf{o}_{i, k} = \sum_{j=1}^N \mathrm{softmax}_j\left(\frac{\langle \mathbf{q}_{i, k}, \mathbf{k}_{ijk} \rangle}{\sqrt{d_h}}\right) \mathbf{v}_{ijk}

The full FloydBlock updates via residual connections and a feed-forward network:

R~(l)=R(l1)+PivotalAttn(Norm(R(l1))) R(l)=R~(l)+FFN(Norm(R~(l)))\begin{aligned} \widetilde{\mathbf{R}}^{(l)} &= \mathbf{R}^{(l-1)} + \mathrm{PivotalAttn}(\mathrm{Norm}(\mathbf{R}^{(l-1)})) \ \mathbf{R}^{(l)} &= \widetilde{\mathbf{R}}^{(l)} + \mathrm{FFN}(\mathrm{Norm}(\widetilde{\mathbf{R}}^{(l)})) \end{aligned}

This global pattern can be interpreted as learning a task-specific relational calculus and enables long-range reasoning in O(L)\mathcal{O}(L) layers for exponential receptive field growth.

3. Expressive Power and Theoretical Properties

FloydNet with k=2k=2 directly implements the 2-Folklore WL (2-FWL) color refinement, which is equivalent in expressive power to the 3-WL test. Each pairwise embedding is refined by attending over the multiset of all length-2 paths (ijk)(i-j-k), matching the combinatorial update rule of 3-WL (Yu et al., 27 Jan 2026).

In general, #1-FloydNetk_k can update each kk-tuple via attention over all pivots, effectively realizing the k-Folklore WL test. This positions FloydNet firmly within the k-FWL hierarchy and endows the architecture with expressive power strictly beyond 1-WL MPNNs and previously established message-passing GNNs. The achievable expressive power can thus be systematically increased via the tuple size, providing a theoretically principled mechanism for higher-order reasoning.

4. Model Architecture and Implementation

FloydNet is structured as a stack of LL FloydBlocks, each utilizing the Pre-LN Transformer pattern:

  • PivotalAttention: multi-head, head dimension dh=dr/hd_h = d_r / h.
  • Feed-forward network (FFN): dr4drdrd_r \rightarrow 4d_r \rightarrow d_r with GeLU activation.
  • Normalization: LayerNorm by default; BatchNorm, RMSNorm, and QK-Norm are also supported.
  • Combine operator C\mathcal{C}: additive (default), multiplicative (for geometric tasks).
  • CUDA kernel: optimized to reduce memory from O(N3)\mathcal{O}(N^3) to O(N2)\mathcal{O}(N^2).

A prototypical iterative refinement pseudocode:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
for l in 1..L:
  Rm1 = LayerNorm(R^(l-1))
  Q = W^Q · Rm1    # [N,N,h,d_h]
  K1 = W^K · Rm1
  V1 = W^V · Rm1
  for each head h:
    for i in 1..N, k in 1..N:
      q = Q[i,k,h,:]
      for j in 1..N:
        k_path[j] = C( K1[i,j,h,:],  K1[j,k,h,:] )
        v_path[j] = C( V1[i,j,h,:],  V1[j,k,h,:] )
      a_j    = q  k_path[j]  / sqrt(d_h)
      w_j    = softmax_j(a_j)
      out[h] = sum_j w_j * v_path[j]
      O[i,k,h,:] = out[h]
  R' = R^(l-1) + concat_heads(O)
  R^(l) = R' + FFN(LayerNorm(R'))
return R^(L)

5. Training Regimes and Hyperparameters

FloydNet is trained under domain-specific configurations:

  • BREC (graph isomorphism): L=32L=32 layers, dr=2d_r=2, single head, BatchNorm, FFN removed, float64, AdamW with η=104\eta=10^{-4}, batch size 64, no positional encodings.
  • CLRS-30 (algorithmic reasoning): up to L80L \approx 80, dr=128d_r=128, 6 heads, AdamW with η=104\eta=10^{-4}, linear warmup and cosine decay, up to 80k steps, tested OOD up to n=64n=64.
  • TSP (combinatorial optimization): dr=384d_r=384, 6 heads, L{8,16,32,64,96}L \in \{8,16,32,64,96\}, DDPM formulation (binary cross-entropy on edges), 400 epochs × 100 steps, 64 GPUs, batch size 1 (with accumulation), trained on N100N \leq 100, tested on 100<N200100 < N \leq 200, optimality filtered via Concorde.

6. Empirical Evaluation

FloydNet establishes strong or state-of-the-art empirical results across domains:

  • Homomorphism Counting: Near-zero mean absolute error on all 8 tasks, surpassing GIN, Subgraph-GNN, 2-GNN, 2-FGNN (all \leq 2-WL).
  • BREC: FloydNet (2-FWL/3-WL) accuracy 67.5% vs. 1-WL GNNs <20%20\%, Graphormer 19.8%, PPGT 58.5%, KP-GNN 68.8%; #1-FloydNet3_3 achieves 95.0%, #1-FloydNet4_4 99.8%, matching 4-WL.
  • CLRS-30: Aggregated test accuracy:
Class Triplet-GMPNN RANR G-ForgetNet RT ET FloydNet
Sort (4) 75.6 94.2 78.1 50.0 82.3 100.0
Search (3) 58.8 82.9 63.8 65.3 63.0 91.6
Greedy (2) 76.4 83.5 91.8 85.3 81.7 93.2
DP (3) 82.0 42.7 86.7 83.2 83.5 90.0
Graph (12) 86.4 74.2 88.8 65.3 86.1 98.6
String (2) 49.1 49.1 54.7 32.5 54.8 99.7
Geometry (3) 94.1 88.4 95.1 84.6 88.2 99.7
Total (30) 80.0 75.8 82.9 66.2 80.1 96.6
  • FloydNet maintains >90% accuracy up to n=256n=256 in OOD settings (no-hint), while hint-augmented models often degrade or run out of memory.
  • TSP (100N200100 \leq N \leq 200): Linkern heuristic optimality 38.8% (general), 16.1% (180N200180 \leq N \leq 200); FloydNet (10 samples) 99.8% overall, 99.4% large NN.
  • LRGB/ZINC: Competitive or superior on selected molecule and vision tasks (e.g., ZINC-full MAE 0.016).

7. Comparison with Message Passing Paradigms and Future Directions

MPNNs propagate information solely along local graph edges, causing over-squashing and constraining expressivity to 1-WL (sometimes 2-WL with extensions). FloydNet, by contrast, leverages a global N2N^2 pairwise representation and DP-style refinement, resulting in exponentially fast receptive field growth (2L2^L paths in LL layers) and provable 3-WL (2-FWL) expressiveness.

The major tradeoff is an increased computational and memory cost: FloydNet’s standard implementation incurs cubic complexity O(N3)\mathcal{O}(N^3). This is mitigated for practical graph sizes (N256N \lesssim 256) with optimized kernels. Its strengths include exact emulation of dynamic programming routines (e.g., Floyd–Warshall), and superior suitability for tasks requiring long-range or combinatorial reasoning.

Current limitations include the cubic scaling and tailoring to moderate-size dense graphs. Open research areas include sparse pivot selection, approximate DP refinement, and multimodal extensions. FloydNet establishes learned DP refinement as a high-expressivity, empirically strong, and theoretically principled alternative to local message passing for global graph reasoning tasks (Yu et al., 27 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FloydNet.