Papers
Topics
Authors
Recent
Search
2000 character limit reached

FlowerFormer: Flow-Aware NAS Encoder

Updated 14 February 2026
  • FlowerFormer is a neural architecture encoder that leverages flow-aware graph transformer mechanisms to accurately estimate neural network performance in NAS settings.
  • It employs bidirectional asynchronous message passing to mimic both forward inference and backward gradient flows, capturing local and global context effectively.
  • Empirical benchmarks demonstrate that FlowerFormer outperforms traditional GNNs and standard transformers, achieving superior ranking accuracy across CV, graph, and ASR tasks.

FlowerFormer is a neural architecture encoding model designed to improve the estimation of neural network performance by leveraging flow-aware graph transformer mechanisms. FlowerFormer is tailored for the Neural Architecture Search (NAS) setting, where rapid and accurate predictions of architecture performance are required without full model training. It is characterized by its incorporation of both local information flow directionality and global graph context, achieved through a combination of bidirectional asynchronous message passing and flow-constrained global attention. Empirical evaluations demonstrate that FlowerFormer outperforms leading graph neural networks (GNNs), conventional graph transformers, and recent flow-based encoding methods across computer vision, graph neural network, and speech recognition architecture benchmarks (Hwang et al., 2024).

1. Motivation for Flow-aware Architecture Encoding

The performance of a neural architecture is intimately linked to its topological structure and the data/task for which it has been designed. Evaluating candidate networks using full training is often computationally infeasible in NAS and related tasks that involve searching or ranking vast sets of candidate graphs. The core challenge is thus to construct an encoder that produces informative feature representations for neural architectures, suitable for accurate performance prediction.

Traditional GNNs (e.g., GCN, GatedGCN, DAGNN) model architectures as directed acyclic graphs (DAGs) and perform local message passing, but these methods are limited by constraints such as over-smoothing and inability to capture long-range dependencies. Graph Transformers introduce global attention but lack sensitivity to flow direction and the semantics of real neural network data (activations, gradients) propagation. An effective encoder must address three key limitations:

  • Flows ignored: Directionality and order of both forward inference and backward gradient flows are essential but not considered by standard transformers.
  • Missing global context: GNNs lack the ability to synthesize information over long graph distances.
  • Expressiveness vs efficiency: The encoder should balance expressive power and computational speed, enabling high-throughput predictions.

FlowerFormer introduces two core innovations to resolve these limitations: (a) bidirectional asynchronous message passing that mimics actual inference and backward-propagation flows, and (b) a flow-aware global attention mechanism that restricts attention to path-reachable node pairs, ensuring flow-respecting information mixing (Hwang et al., 2024).

2. Model Architecture and Core Modules

In FlowerFormer, each layer (termed a "Flower" layer) processes an input network DAG G=(A,X)G = (A, X), where AA is the adjacency matrix and XX indicates node (operation) types. The encoding pipeline proceeds as follows:

  1. Node encoding: Input node types X∈{0,1}N×DX \in \{0,1\}^{N \times D} (one-hot) are projected into embedding space via H(0)=XPH^{(0)} = X P, with P∈RD×dP \in \mathbb{R}^{D \times d} learnable.
  2. Layerwise encoding: For each of LL layers, node embeddings H(â„“)H^{(\ell)} are refined using two distinct submodules operating in parallel:
    • Flow-encode module: Performs bidirectional asynchronous message passing, first in topologically sorted (forward) then reverse (backward) order, imitating actual activation and gradient flows.
    • Flow-aware global attention: Implements masked multi-head attention constrained such that node ii can attend only to nodes that share a forward or backward path, enforced by a reachability-based mask.

The outputs of the two modules are fused via a skip-connected feed-forward network (FFN), yielding new node representations for the next layer.

  1. Graph-level readout: Final layer node representations are aggregated (zG=READOUT(H(L))z_G = \text{READOUT}(H^{(L)})), typically by mean pooling, and passed through a linear regressor to approximate the performance metric (y^G=Linear(zG)\hat y_G = \text{Linear}(z_G)).

Loss optimization is performed with a margin ranking objective, emphasizing relative performance ordering between architecture pairs.

3. Bidirectional Asynchronous Message Passing

The bidirectional asynchronous message passing (the Flower "Flow-encode" module) is designed to reflect the sequential nature of both activation flow during inference and gradient flow during backpropagation in real architectures. The algorithm executes in two passes:

  • Forward pass: Nodes are updated in topological order, using messages from incoming neighbors. Updates are performed in-place, capturing the cumulative effect of upstream computations.
  • Backward pass: Nodes are updated in reverse topological order, using outgoing neighbors, simulating backward information flow.

The concrete message and combination functions are: me(hj,hi)=softmax(w1⊤hj+w2⊤hi)him_e(h_j, h_i) = \mathrm{softmax}(w_1^\top h_j + w_2^\top h_i) h_i

Comb(hj,msgj)=GRU(hj,msgj)\mathrm{Comb}(h_j, \mathrm{msg}_j) = \mathrm{GRU}(h_j, \mathrm{msg}_j)

where msgj\mathrm{msg}_j collects the sum of messages to node jj, and w1,w2w_1, w_2 are learnable. This in-place, ordered update ensures each node’s embedding accurately reflects flow-aware context at each generation.

4. Flow-aware Global Attention Mechanism

After flow encoding, a multi-head attention step operates on the node embeddings, but with a mask restricting attention to flow-reachable pairs only. Specifically, the mask M∈RN×NM \in \mathbb{R}^{N \times N} is defined as: Mij={0,if i reachable from j or j reachable from i −∞,otherwiseM_{ij} = \begin{cases} 0, & \text{if } i \text{ reachable from } j \text{ or } j \text{ reachable from } i \ -\infty, & \text{otherwise} \end{cases} With this mask, standard attention weights (after query-key scoring) are replaced by −∞-\infty if nodes are topologically disconnected, preventing spurious communication. Standard masked multi-head (MMHA) attention then proceeds as in vanilla transformers. This mechanism enforces that only information along feasible computation/gradient paths is integrated at the global level.

5. Experimental Benchmarks and Comparative Performance

Experimental evaluations of FlowerFormer were conducted on a range of architectural benchmarks:

  • NAS-Bench-101 (CV, 14K DAGs)
  • NAS-Bench-201 (CV, 15K)
  • NAS-Bench-301 (CV, 57K surrogate)
  • NAS-Bench-Graph (GNNs, 26K architectures)
  • NAS-Bench-ASR (8K speech models)

Performance metrics include Kendall’s Tau correlation (ranking agreement) and Precision@K (top-K accuracy). FlowerFormer consistently attains the highest Tau across all data splits (1%, 5%, 10%, 50%) and benchmarks; for example, on NAS-Bench-101 (5%) FlowerFormer achieves τ=0.861\tau = 0.861, outperforming TA-GATES and GraphGPS. Precision@K results show FlowerFormer ranking first in 10 out of 12 subsettings and second in the remaining ones. Comparable improvements were observed on graph and ASR benchmarks, demonstrating FlowerFormer’s cross-domain generality (Hwang et al., 2024).

6. Ablation Analysis and Component Importance

Ablation studies systematically remove or replace components of FlowerFormer to assess their importance. The following major alternatives were compared:

  1. No flow-encode (pure graph transformer)
  2. Synchronous message passing (instead of asynchronous)
  3. Forward-only message passing (no backward pass)
  4. No flow mask in the attention

Findings indicate that:

  • Removing asynchronous order leads to substantial degradation (Tau drop of ~10–40 points).
  • Forward-only propagation reduces Tau by 2–10 points.
  • Removing the flow-aware mask in attention also reduces effectiveness, though less dramatically.

Attention visualization further supports that the flow-based mask restricts attention to ancestor/descendant nodes, preventing mixing between unrelated subgraphs. Each architectural principle—message-pass order, bidirectional processing, and flow-aware global mixing—contributes measurably to overall accuracy.

7. Extensions and Future Directions

FlowerFormer is the first graph transformer framework explicitly designed to respect the intrinsic information flow characteristics of neural networks, outperforming pure GNNs, generic graph transformers, and prior flow-oriented models in NAS-relevant domains. Future work suggested includes:

  • Extension to cross-cell or cross-stage flow-aware attention for more complex NAS settings.
  • Richer semantic modeling of edges and operations within the Flow-encode module.
  • Enhancing the encoder with dataset or task metadata.
  • Efficient mask pruning strategies to scale to very large graphs.

The full implementation and trained models for FlowerFormer are publicly available, providing a reference implementation for research and application in neural architecture encoding (Hwang et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FlowerFormer.