FloydNet: DP-Style Global Graph Learning

Updated 23 February 2026

FloydNet is a graph learning architecture that uses global DP-style iterative refinement to capture complex combinatorial and relational reasoning.
It employs a dense all-pairs relationship tensor with a learnable operator inspired by the Floyd-Warshall algorithm to achieve high-order expressive power.
Empirical results show state-of-the-art performance on benchmarks including CLRS-30, TSP, BREC, and molecule property prediction.

FloydNet is a graph learning architecture that realizes global, dynamic programming (DP)-style iterative refinement for combinatorial, algorithmic, and relational reasoning tasks. It departs from message-passing graph neural networks (MPNNs) by operating on a dense, all-pairs relationship representation, using a learnable operator inspired by the Floyd-Warshall algorithm. FloydNet attains higher-order expressive power, precisely implements generalized k-Folklore Weisfeiler-Lehman (k-FWL) color refinement, and achieves state-of-the-art empirical performance across a suite of challenging benchmarks, including CLRS-30, BREC, Traveling Salesman Problem (TSP), and molecule property prediction (Yu et al., 27 Jan 2026).

1. Global All-Pairs Representation

At the core of FloydNet is a dense, global "relationship tensor" maintained at every layer $l$ :

$\mathbf{R}^{(l)} \in \mathbb{R}^{N \times N \times d_r}$

Here, $N$ is the number of graph nodes and $d_r$ is the hidden dimension. Each entry $\mathbf{R}^{(l)}_{i, k} \in \mathbb{R}^{d_r}$ encodes the current embedding of the relationship between nodes $i$ and $k$ . The initialization $\mathbf{R}^{(0)}$ aggregates node features $\mathbf{X}_i \in \mathbb{R}^{d_n}$ , edge features $\mathbf{E}_{i, k} \in \mathbb{R}^{d_e}$ , and global features $\mathbf{G} \in \mathbb{R}^{d_g}$ using a multi-layer perceptron:

$\mathbf{R}^{(0)}_{i, k} = \mathrm{MLP}_{\text{init}}([\mathbf{G}, \mathbf{X}_i, \mathbf{X}_k, \mathbf{E}_{i, k}])$

Such a design enables direct modeling of long-range and high-order dependencies, in contrast to local aggregation schemes in standard GNNs.

Each FloydNet layer, termed a "FloydBlock," performs a global update of the relationship tensor using a learnable analogue of the Floyd-Warshall update:

$D_{ik} \leftarrow \min_j (D_{ij} + D_{jk})$

In FloydNet, the scalar min and addition are replaced with high-dimensional attention-based operators ("Pivotal Attention"). Given $\mathbf{R}^{(l-1)}$ , normalized as $\mathbf{R}' = \mathrm{Norm}(\mathbf{R}^{(l-1)})$ :

Pairwise queries $\mathbf{q}^{(l-1)}_{i,k}$ , keys $\mathbf{k}^{(l-1)}_{i,j}$ and $\mathbf{k}^{(l-1)}_{j,k}$ , and values are linearly projected for each path $i \rightarrow j \rightarrow k$ .
Keys and values along two-hop paths are combined (element-wise addition, default).
Scaled dot-product attention over all pivot nodes $j$ computes:

$\mathbf{o}_{i, k} = \sum_{j=1}^N \mathrm{softmax}_j\left(\frac{\langle \mathbf{q}_{i, k}, \mathbf{k}_{ijk} \rangle}{\sqrt{d_h}}\right) \mathbf{v}_{ijk}$

The full FloydBlock updates via residual connections and a feed-forward network:

$\begin{aligned} \widetilde{\mathbf{R}}^{(l)} &= \mathbf{R}^{(l-1)} + \mathrm{PivotalAttn}(\mathrm{Norm}(\mathbf{R}^{(l-1)})) \ \mathbf{R}^{(l)} &= \widetilde{\mathbf{R}}^{(l)} + \mathrm{FFN}(\mathrm{Norm}(\widetilde{\mathbf{R}}^{(l)})) \end{aligned}$

This global pattern can be interpreted as learning a task-specific relational calculus and enables long-range reasoning in $\mathcal{O}(L)$ layers for exponential receptive field growth.

3. Expressive Power and Theoretical Properties

FloydNet with $k=2$ directly implements the 2-Folklore WL (2-FWL) color refinement, which is equivalent in expressive power to the 3-WL test. Each pairwise embedding is refined by attending over the multiset of all length-2 paths $(i-j-k)$ , matching the combinatorial update rule of 3-WL (Yu et al., 27 Jan 2026).

In general, #1-FloydNet $_k$ can update each $k$ -tuple via attention over all pivots, effectively realizing the k-Folklore WL test. This positions FloydNet firmly within the k-FWL hierarchy and endows the architecture with expressive power strictly beyond 1-WL MPNNs and previously established message-passing GNNs. The achievable expressive power can thus be systematically increased via the tuple size, providing a theoretically principled mechanism for higher-order reasoning.

4. Model Architecture and Implementation

FloydNet is structured as a stack of $L$ FloydBlocks, each utilizing the Pre-LN Transformer pattern:

PivotalAttention: multi-head, head dimension $d_h = d_r / h$ .
Feed-forward network (FFN): $d_r \rightarrow 4d_r \rightarrow d_r$ with GeLU activation.
Normalization: LayerNorm by default; BatchNorm, RMSNorm, and QK-Norm are also supported.
Combine operator $\mathcal{C}$ : additive (default), multiplicative (for geometric tasks).
CUDA kernel: optimized to reduce memory from $\mathcal{O}(N^3)$ to $\mathcal{O}(N^2)$ .

A prototypical iterative refinement pseudocode:

for l in 1..L:
  Rm1 = LayerNorm(R^(l-1))
  Q = W^Q · Rm1    # [N,N,h,d_h]
  K1 = W^K · Rm1
  V1 = W^V · Rm1
  for each head h:
    for i in 1..N, k in 1..N:
      q = Q[i,k,h,:]
      for j in 1..N:
        k_path[j] = C( K1[i,j,h,:],  K1[j,k,h,:] )
        v_path[j] = C( V1[i,j,h,:],  V1[j,k,h,:] )
      a_j    = q ⋅ k_path[j]  / sqrt(d_h)
      w_j    = softmax_j(a_j)
      out[h] = sum_j w_j * v_path[j]
      O[i,k,h,:] = out[h]
  R' = R^(l-1) + concat_heads(O)
  R^(l) = R' + FFN(LayerNorm(R'))
return R^(L)

5. Training Regimes and Hyperparameters

FloydNet is trained under domain-specific configurations:

BREC (graph isomorphism): $L=32$ layers, $d_r=2$ , single head, BatchNorm, FFN removed, float64, AdamW with $\eta=10^{-4}$ , batch size 64, no positional encodings.
CLRS-30 (algorithmic reasoning): up to $L \approx 80$ , $d_r=128$ , 6 heads, AdamW with $\eta=10^{-4}$ , linear warmup and cosine decay, up to 80k steps, tested OOD up to $n=64$ .
TSP (combinatorial optimization): $d_r=384$ , 6 heads, $L \in \{8,16,32,64,96\}$ , DDPM formulation (binary cross-entropy on edges), 400 epochs × 100 steps, 64 GPUs, batch size 1 (with accumulation), trained on $N \leq 100$ , tested on $100 < N \leq 200$ , optimality filtered via Concorde.

6. Empirical Evaluation

FloydNet establishes strong or state-of-the-art empirical results across domains:

Homomorphism Counting: Near-zero mean absolute error on all 8 tasks, surpassing GIN, Subgraph-GNN, 2-GNN, 2-FGNN (all $\leq$ 2-WL).
BREC: FloydNet (2-FWL/3-WL) accuracy 67.5% vs. 1-WL GNNs < $20\%$ , Graphormer 19.8%, PPGT 58.5%, KP-GNN 68.8%; #1-FloydNet $_3$ achieves 95.0%, #1-FloydNet $_4$ 99.8%, matching 4-WL.
CLRS-30: Aggregated test accuracy:

Class	Triplet-GMPNN	RANR	G-ForgetNet	RT	ET	FloydNet
Sort (4)	75.6	94.2	78.1	50.0	82.3	100.0
Search (3)	58.8	82.9	63.8	65.3	63.0	91.6
Greedy (2)	76.4	83.5	91.8	85.3	81.7	93.2
DP (3)	82.0	42.7	86.7	83.2	83.5	90.0
Graph (12)	86.4	74.2	88.8	65.3	86.1	98.6
String (2)	49.1	49.1	54.7	32.5	54.8	99.7
Geometry (3)	94.1	88.4	95.1	84.6	88.2	99.7
Total (30)	80.0	75.8	82.9	66.2	80.1	96.6

FloydNet maintains >90% accuracy up to $n=256$ in OOD settings (no-hint), while hint-augmented models often degrade or run out of memory.
TSP ( $100 \leq N \leq 200$ ): Linkern heuristic optimality 38.8% (general), 16.1% ( $180 \leq N \leq 200$ ); FloydNet (10 samples) 99.8% overall, 99.4% large $N$ .
LRGB/ZINC: Competitive or superior on selected molecule and vision tasks (e.g., ZINC-full MAE 0.016).

7. Comparison with Message Passing Paradigms and Future Directions

MPNNs propagate information solely along local graph edges, causing over-squashing and constraining expressivity to 1-WL (sometimes 2-WL with extensions). FloydNet, by contrast, leverages a global $N^2$ pairwise representation and DP-style refinement, resulting in exponentially fast receptive field growth ( $2^L$ paths in $L$ layers) and provable 3-WL (2-FWL) expressiveness.

The major tradeoff is an increased computational and memory cost: FloydNet’s standard implementation incurs cubic complexity $\mathcal{O}(N^3)$ . This is mitigated for practical graph sizes ( $N \lesssim 256$ ) with optimized kernels. Its strengths include exact emulation of dynamic programming routines (e.g., Floyd–Warshall), and superior suitability for tasks requiring long-range or combinatorial reasoning.

Current limitations include the cubic scaling and tailoring to moderate-size dense graphs. Open research areas include sparse pivot selection, approximate DP refinement, and multimodal extensions. FloydNet establishes learned DP refinement as a high-expressivity, empirically strong, and theoretically principled alternative to local message passing for global graph reasoning tasks (Yu et al., 27 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

FloydNet: A Learning Paradigm for Global Relational Reasoning (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FloydNet.

FloydNet: DP-Style Global Graph Learning

1. Global All-Pairs Representation

2. FloydBlock: Learned DP-Style Refinement

3. Expressive Power and Theoretical Properties

4. Model Architecture and Implementation

5. Training Regimes and Hyperparameters

6. Empirical Evaluation

7. Comparison with Message Passing Paradigms and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

FloydNet: DP-Style Global Graph Learning

1. Global All-Pairs Representation

2. FloydBlock: Learned DP-Style Refinement

3. Expressive Power and Theoretical Properties

4. Model Architecture and Implementation

5. Training Regimes and Hyperparameters

6. Empirical Evaluation

7. Comparison with Message Passing Paradigms and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research