EATNet-A: Elastic Architecture Transfer Model

Updated 23 April 2026

The paper introduces a novel elastic architecture transfer method that adapts network design from CIFAR-10 to ImageNet to enhance performance.
It employs a MobileNet-style macro-structure with MBConv and depthwise-separable convolutions, achieving 5.1M parameters and 563M FLOPs.
EATNet-A delivers competitive accuracy (74.7% top-1, 92.0% top-5) while reducing search cost by nearly 100-fold over RL-based NAS methods.

EATNet-A is the best performing neural network model discovered on ImageNet using EAT-NAS (Elastic Architecture Transfer for Neural Architecture Search), a framework designed to accelerate large-scale neural architecture search by transferring architectural design knowledge from small-scale to large-scale datasets. EATNet-A is characterized by a MobileNet-style macro-structure, and it exemplifies the success of architecture-level transfer using evolutionary search with explicit adaptation of multiple architecture primitives. Notably, EATNet-A delivers competitive performance under stringent computational budgets, demonstrating both efficiency and accuracy in “mobile” model settings (Fang et al., 2019).

1. Architectural Specification

EATNet-A employs a staged, block-wise architecture with extensive use of inverted-bottleneck (MBConv) and depthwise-separable convolution operations. The network is structured as follows:

Stage	Block Type	Kernel Size(s)	Channels (Input→Output)	Stride	Depth (Repeats)	Output Size
Stem	Conv3×3	3	3→32	2	1	–
Stage 1	MBConv (exp=6)	3	32→32	1	2	112×112
Stage 2	SepConv	5	32→64	2	2	56×56
Stage 3	MBConv (exp=6)	3	64→64	1	3	56×56
Stage 4	MBConv (exp=6)	7	64→112	2	4	28×28
Stage 5	MBConv (exp=6)	3	112→256	1	4	28×28
Stage 6	MBConv (exp=6)	3	256→512	2	1	14×14
Classifier	GlobalAvgPool, FC	–	–	–	–	–

All convolutional layers employ ReLU6 activations and batch normalization after expansions, depthwise, and projections. The classifier consists of global average pooling, a fully connected layer for 1000 classes, and a softmax. Total parameter count is approximately 5.1 million with roughly 563 million multiply-adds.

2. Elastic Architecture Transfer Methodology

The elastic transfer mechanism leverages a “basic” architecture, first identified via search on CIFAR-10, as a seed for initialization on ImageNet. This seed is then diversified via perturbations of all five architecture primitives: operation type, kernel size, skip-connection flag, width factor, and depth factor. For each block in the seed, one primitive is uniformly sampled and replaced with a new random value from its domain.

During evolution, the same perturbation operator serves as mutation, continually encouraging exploration and adaptation at the architectural level. This explicit design ensures adaptation to dataset scale and task characteristics while enabling rapid convergence on large databases.

3. Evolutionary Search Framework

EATNet-A is identified via a tournament-based evolutionary algorithm (EA) executed in the ImageNet search space. The process is summarized as follows:

Population size: 64
Tournament sample size: 16
Number of generations: ~100 (approximately 164 models sampled)
Candidate generation: Each new candidate is produced by mutating the current best individual (as per score) and replacing the lowest scoring member in the population.
Mutation: For each block, each primitive has a uniform 1/5 probability of being perturbed.

The fitness or “score” of each candidate model $M$ is computed as:

$\textrm{score}(M) = \textrm{acc}(M) \times \left[\frac{\mathrm{Size}(M)}{T}\right]^{\omega_0}$

where $\mathrm{Size}(M)$ is the number of multiply-adds, $T=500~\mathrm{M}$ is the reference FLOPs, and $\omega_0 = -0.07$ is the exponent balancing accuracy and efficiency. Search termination is governed by monitoring the quality of population metrics, including mean and standard deviation of the scores within the population.

The search is markedly efficient: 22 hours on 4 GPUs plus 4 days on 8 GPUs (total ≈856 GPU-hours).

4. Training Procedures

During search, all candidate models are trained for one epoch with stochastic gradient descent (SGD): learning rate = 0.05, momentum = 0.9, weight decay = 3×10 $^{-4}$ , batch size = 128.

The final selected EATNet-A architecture is retrained from scratch for 200 epochs using the following regimen:

SGD optimizer with momentum 0.9, weight decay 4×10 $^{-5}$
Batch size: 256 over 4 GPUs
Initial learning rate: 0.1, polynomially decayed to 1×10 $^{-4}$
Label smoothing: $\epsilon=0.1$
Data augmentation: random resized crops (area in [0.08, 1.0], aspect ratio in [3/4, 4/3]), horizontal flips, per-pixel mean subtraction

5. Computational Cost Analysis

EATNet-A demonstrates substantial computational savings in large-scale NAS:

Total search cost: ≈856 GPU-hours (≈35.7 single-GPU days, or 32 GPU-days on an 8-GPU machine)
For comparison, MnasNet (an RL-based NAS method) requires ≈91,000 GPU-hours for the same task
Inference complexity: 5.1M parameters, 563M FLOPs

A concise FLOPs estimate per layer is provided by:

${\rm FLOPs} = \sum_{\ell} H_\ell W_\ell K_\ell^2 c_{\rm in}^\ell c_{\rm out}^\ell$

This metric aligns EATNet-A within the “mobile” model domain (300M–600M FLOPs).

6. Performance Evaluation and Comparative Analysis

On the ImageNet classification task under the mobile constraint, EATNet-A achieves:

Top-1 accuracy: 74.7%
Top-5 accuracy: 92.0%

Representative comparison with peer models:

Model	Params (M)	FLOPs (M)	Top-1 / Top-5 Acc. (%)	Search Cost (GPU-hours)
EATNet-A	5.1	563	74.7 / 92.0	≈856
MnasNet	4.2	317	74.0 / 91.8	≈91,000
MobileNet-v2*	6.9	585	74.7 / —	—
NASNet-A	5.3	564	74.0 / 91.6	≈48,000

*MobileNet-v2-1.4× variant.

EATNet-A meets or exceeds the top-1 accuracy of all comparable models, matching MobileNet-v2-1.4× at a lower FLOPs count and utilizing approximately 100-fold less search compute compared to RL-based NAS.

7. Significance in Neural Architecture Search

EATNet-A exemplifies the capability of elastic architecture transfer to overcome the scalability and efficiency barriers inherent to NAS on large datasets. The mechanism’s ability to adapt all five core architecture primitives leads to models that are simultaneously accurate and computationally tractable. The empirical trade-off between search cost, inference efficiency, and accuracy evidenced by EATNet-A and its peer comparison demonstrates the effectiveness of EAT-NAS’s knowledge transfer paradigm in practical, large-scale, resource-constrained deep learning contexts (Fang et al., 2019).

Markdown Report Issue Upgrade to Chat

References (1)

EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to EATNet-A.