Papers
Topics
Authors
Recent
Search
2000 character limit reached

EATNet-B: Elastic Architecture Transfer for NAS

Updated 23 April 2026
  • The paper presents the EATNet-B architecture discovered via elastic architecture transfer, achieving 74.2% Top-1 accuracy on ImageNet with only 5.3M parameters.
  • EATNet-B is a convolutional neural network defined by 7 sequential blocks using MobileNetV2-style inverted bottlenecks and depthwise separable convolutions.
  • The elastic transfer mechanism perturbs a high-performing seed model locally, reducing the search process by over 50× while maintaining competitive performance.

EATNet-B is a convolutional neural network architecture discovered by the two-stage Elastic Architecture Transfer for Neural Architecture Search (EAT-NAS) framework. Within the context of large-scale neural architecture search (NAS), EATNet-B exemplifies an efficient mechanism for transferring architecture designs from small-scale datasets to computationally intensive, large-scale tasks, specifically ImageNet classification. The network achieves competitive performance at a fraction of the computational cost expended by baseline NAS methods, by employing “elastic” perturbations of a high-performing model initially evolved on CIFAR-10 (Fang et al., 2019).

1. Architectural Specification

EATNet-B comprises 7 sequential blocks in a feed-forward configuration. The architecture search operates within the MobileNetV2-style inverted bottleneck/depth-wise separable convolution space. Each block BiB^i is parameterized by a 5-tuple: Bi=(convType,ki,si,wi,di)B^i = (\mathrm{convType},\,k_i,\,s_i,\,w_i,\,d_i) with the following primitive domains:

  • convType{SepConv,MBConv3,MBConv6}\mathrm{convType} \in \{\mathrm{SepConv}, \mathrm{MBConv3}, \mathrm{MBConv6}\}
  • ki{3,5,7}k_i \in \{3,5,7\} (kernel size)
  • si{0,1}s_i \in \{0,1\} (presence of skip-connection)
  • wi{0.5,1.0,1.5,2.0}w_i \in \{0.5, 1.0, 1.5, 2.0\} (width expansion factor)
  • di{1,2,3,4}d_i \in \{1,2,3,4\} (layers per block)

The backbone features a narrow SepConv head, followed by blocks utilizing MBConv6 operations with varying kernel sizes and skip configurations; the terminal block implements standard classification head operations (global average pooling, 1000-way FC, softmax). The final model size is 5.3 M parameters and 551 M multiply-adds at standard ImageNet input (224×224).

2. Elastic Architecture Transfer Mechanism

EATNet-B results from the elastic transfer of a “seed” architecture optimized on CIFAR-10. Upon convergence, the top performer on CIFAR-10 (ArchbasicArch_{basic}) serves as the initialization for the ImageNet search. The transfer employs an Architecture Perturbation Function: for each individual in the ImageNet population, and for each of the 7 blocks, a single block primitive (randomly selected from type, kernel, skip, width, or depth) is re-sampled from its domain, while all other parameters are retained from the seed.

Mathematically, for block ii, denoting the seed’s tuple as BbasiciB^i_{basic}, for each derived individual Bi=(convType,ki,si,wi,di)B^i = (\mathrm{convType},\,k_i,\,s_i,\,w_i,\,d_i)0:

  1. Randomly pick a primitive index Bi=(convType,ki,si,wi,di)B^i = (\mathrm{convType},\,k_i,\,s_i,\,w_i,\,d_i)1.
  2. Sample a new value Bi=(convType,ki,si,wi,di)B^i = (\mathrm{convType},\,k_i,\,s_i,\,w_i,\,d_i)2 from the appropriate domain.
  3. Assign Bi=(convType,ki,si,wi,di)B^i = (\mathrm{convType},\,k_i,\,s_i,\,w_i,\,d_i)3, leave other components unchanged.

This elastic approach ensures population diversity concentrated in the local neighborhood of a high-quality solution, facilitating rapid adaptation to the requirements and visual complexity of large-scale data.

3. Evolutionary Search Process and Objectives

The large-scale search phase deploys a steady-state evolutionary algorithm:

  • Population: Bi=(convType,ki,si,wi,di)B^i = (\mathrm{convType},\,k_i,\,s_i,\,w_i,\,d_i)4
  • Tournament sample: Bi=(convType,ki,si,wi,di)B^i = (\mathrm{convType},\,k_i,\,s_i,\,w_i,\,d_i)5
  • Per-individual training: single epoch on a 50K held-out ImageNet subset, batch size 128, SGD (Bi=(convType,ki,si,wi,di)B^i = (\mathrm{convType},\,k_i,\,s_i,\,w_i,\,d_i)6, momentum 0.9, weight decay Bi=(convType,ki,si,wi,di)B^i = (\mathrm{convType},\,k_i,\,s_i,\,w_i,\,d_i)7)
  • Search duration: ≈100 generations, ≈164 total models (initial + mutated)

A composite scoring function directs selection toward models near a computational budget (multi-adds target Bi=(convType,ki,si,wi,di)B^i = (\mathrm{convType},\,k_i,\,s_i,\,w_i,\,d_i)8 M):

Bi=(convType,ki,si,wi,di)B^i = (\mathrm{convType},\,k_i,\,s_i,\,w_i,\,d_i)9

where convType{SepConv,MBConv3,MBConv6}\mathrm{convType} \in \{\mathrm{SepConv}, \mathrm{MBConv3}, \mathrm{MBConv6}\}0 penalizes computationally expensive models. Population quality is tracked via:

convType{SepConv,MBConv3,MBConv6}\mathrm{convType} \in \{\mathrm{SepConv}, \mathrm{MBConv3}, \mathrm{MBConv6}\}1

with convType{SepConv,MBConv3,MBConv6}\mathrm{convType} \in \{\mathrm{SepConv}, \mathrm{MBConv3}, \mathrm{MBConv6}\}2 and transition at convType{SepConv,MBConv3,MBConv6}\mathrm{convType} \in \{\mathrm{SepConv}, \mathrm{MBConv3}, \mathrm{MBConv6}\}3.

After convergence, the top 8 candidates are re-trained for 200 epochs at high resolution, and the top performer (EATNet-B) is reported.

4. Performance and Comparative Metrics

EATNet-B achieves, on ImageNet validation (single-crop 224×224):

  • Top-1: 74.2 %
  • Top-5: 91.8 %
  • Model size: 5.3 M parameters, 551 M multi-adds
  • Search cost: ≈ 856 GPU-hours (≈ 4 days on 8×Titan X)

For comparison:

Model Params (M) Multi-Adds (M) Top-1 (%) Top-5 (%) GPU-Hours
EATNet-A 5.1 563 74.7 92.0 ~856
EATNet-B 5.3 551 74.2 91.8 ~856
MnasNet 4.2 317 74.0 91.8 ~91,000
NASNet-A 5.3 564 74.0 91.6 ~48,000

Relative to EATNet-A, EATNet-B yields a 0.5 % Top-1 reduction offset by ~12 M multi-adds savings; compared to RL- or hand-transfer-based NAS, EATNet-B obtains equal or higher accuracy at more than two orders of magnitude lower computational budget (Fang et al., 2019).

5. Empirical Analyses and Ablations

Empirical studies demonstrate the effectiveness of elastic transfer. Initializing the population with elastically mutated versions of a strong CIFAR-10 seed yields an average 5 % boost in accuracy during early generations versus scratch search, and reduces convergence from over 200 to approximately 100 generations—representing a >50× computational saving. Directly hand-transferring the basic CIFAR-10 model without adaptation results in an unacceptably high compute cost (886 M multi-adds for 74.3 % Top-1), whereas EAT-NAS re-optimizes to meet the multi-add budget (~550 M) without sacrificing accuracy.

Seed quality is essential; using a suboptimal CIFAR-10 model as the basis degrades convergence and population accuracy, confirming the transfer mechanism’s sensitivity to source architecture performance. Trade-offs between accuracy and computational cost are achieved through the multi-objective search, exemplified by the EATNet-A versus EATNet-B comparison.

6. Mathematical Formulations and Algorithmic Details

Key equations governing model scoring and convergence within the EAT-NAS process are:

  • Pareto model scoring:

convType{SepConv,MBConv3,MBConv6}\mathrm{convType} \in \{\mathrm{SepConv}, \mathrm{MBConv3}, \mathrm{MBConv6}\}4

  • Population quality:

convType{SepConv,MBConv3,MBConv6}\mathrm{convType} \in \{\mathrm{SepConv}, \mathrm{MBConv3}, \mathrm{MBConv6}\}5

with convType{SepConv,MBConv3,MBConv6}\mathrm{convType} \in \{\mathrm{SepConv}, \mathrm{MBConv3}, \mathrm{MBConv6}\}6, and convType{SepConv,MBConv3,MBConv6}\mathrm{convType} \in \{\mathrm{SepConv}, \mathrm{MBConv3}, \mathrm{MBConv6}\}7.

  • Elastic architecture perturbation: implemented as described in Algorithm 3 of the source.

These mathematical formulations enable EATNet-B’s robust and transferable optimization, enforcing explicit constraints on computational complexity while preserving competitive accuracy.

7. Significance and Implications

EATNet-B’s design and discovery process exemplify practical solutions to scaling NAS for large datasets under limited compute. The elastic architecture transfer yields hardware-efficient models, outperforms naïve small-to-large transfer and large-scale from-scratch NAS, and demonstrates that leveraging high-quality seeds with localized, stochastic modifications is effective in high-dimensional architecture spaces. A plausible implication is the general utility of such transfer mechanisms across NAS domains where computational cost or dataset scale otherwise prohibits exhaustive search (Fang et al., 2019).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to EATNet-B.