EATNet-A: Elastic Architecture Transfer Model
- The paper introduces a novel elastic architecture transfer method that adapts network design from CIFAR-10 to ImageNet to enhance performance.
- It employs a MobileNet-style macro-structure with MBConv and depthwise-separable convolutions, achieving 5.1M parameters and 563M FLOPs.
- EATNet-A delivers competitive accuracy (74.7% top-1, 92.0% top-5) while reducing search cost by nearly 100-fold over RL-based NAS methods.
EATNet-A is the best performing neural network model discovered on ImageNet using EAT-NAS (Elastic Architecture Transfer for Neural Architecture Search), a framework designed to accelerate large-scale neural architecture search by transferring architectural design knowledge from small-scale to large-scale datasets. EATNet-A is characterized by a MobileNet-style macro-structure, and it exemplifies the success of architecture-level transfer using evolutionary search with explicit adaptation of multiple architecture primitives. Notably, EATNet-A delivers competitive performance under stringent computational budgets, demonstrating both efficiency and accuracy in “mobile” model settings (Fang et al., 2019).
1. Architectural Specification
EATNet-A employs a staged, block-wise architecture with extensive use of inverted-bottleneck (MBConv) and depthwise-separable convolution operations. The network is structured as follows:
| Stage | Block Type | Kernel Size(s) | Channels (Input→Output) | Stride | Depth (Repeats) | Output Size |
|---|---|---|---|---|---|---|
| Stem | Conv3×3 | 3 | 3→32 | 2 | 1 | – |
| Stage 1 | MBConv (exp=6) | 3 | 32→32 | 1 | 2 | 112×112 |
| Stage 2 | SepConv | 5 | 32→64 | 2 | 2 | 56×56 |
| Stage 3 | MBConv (exp=6) | 3 | 64→64 | 1 | 3 | 56×56 |
| Stage 4 | MBConv (exp=6) | 7 | 64→112 | 2 | 4 | 28×28 |
| Stage 5 | MBConv (exp=6) | 3 | 112→256 | 1 | 4 | 28×28 |
| Stage 6 | MBConv (exp=6) | 3 | 256→512 | 2 | 1 | 14×14 |
| Classifier | GlobalAvgPool, FC | – | – | – | – | – |
All convolutional layers employ ReLU6 activations and batch normalization after expansions, depthwise, and projections. The classifier consists of global average pooling, a fully connected layer for 1000 classes, and a softmax. Total parameter count is approximately 5.1 million with roughly 563 million multiply-adds.
2. Elastic Architecture Transfer Methodology
The elastic transfer mechanism leverages a “basic” architecture, first identified via search on CIFAR-10, as a seed for initialization on ImageNet. This seed is then diversified via perturbations of all five architecture primitives: operation type, kernel size, skip-connection flag, width factor, and depth factor. For each block in the seed, one primitive is uniformly sampled and replaced with a new random value from its domain.
During evolution, the same perturbation operator serves as mutation, continually encouraging exploration and adaptation at the architectural level. This explicit design ensures adaptation to dataset scale and task characteristics while enabling rapid convergence on large databases.
3. Evolutionary Search Framework
EATNet-A is identified via a tournament-based evolutionary algorithm (EA) executed in the ImageNet search space. The process is summarized as follows:
- Population size: 64
- Tournament sample size: 16
- Number of generations: ~100 (approximately 164 models sampled)
- Candidate generation: Each new candidate is produced by mutating the current best individual (as per score) and replacing the lowest scoring member in the population.
- Mutation: For each block, each primitive has a uniform 1/5 probability of being perturbed.
The fitness or “score” of each candidate model is computed as:
where is the number of multiply-adds, is the reference FLOPs, and is the exponent balancing accuracy and efficiency. Search termination is governed by monitoring the quality of population metrics, including mean and standard deviation of the scores within the population.
The search is markedly efficient: 22 hours on 4 GPUs plus 4 days on 8 GPUs (total ≈856 GPU-hours).
4. Training Procedures
During search, all candidate models are trained for one epoch with stochastic gradient descent (SGD): learning rate = 0.05, momentum = 0.9, weight decay = 3×10, batch size = 128.
The final selected EATNet-A architecture is retrained from scratch for 200 epochs using the following regimen:
- SGD optimizer with momentum 0.9, weight decay 4×10
- Batch size: 256 over 4 GPUs
- Initial learning rate: 0.1, polynomially decayed to 1×10
- Label smoothing:
- Data augmentation: random resized crops (area in [0.08, 1.0], aspect ratio in [3/4, 4/3]), horizontal flips, per-pixel mean subtraction
5. Computational Cost Analysis
EATNet-A demonstrates substantial computational savings in large-scale NAS:
- Total search cost: ≈856 GPU-hours (≈35.7 single-GPU days, or 32 GPU-days on an 8-GPU machine)
- For comparison, MnasNet (an RL-based NAS method) requires ≈91,000 GPU-hours for the same task
- Inference complexity: 5.1M parameters, 563M FLOPs
A concise FLOPs estimate per layer is provided by:
This metric aligns EATNet-A within the “mobile” model domain (300M–600M FLOPs).
6. Performance Evaluation and Comparative Analysis
On the ImageNet classification task under the mobile constraint, EATNet-A achieves:
- Top-1 accuracy: 74.7%
- Top-5 accuracy: 92.0%
Representative comparison with peer models:
| Model | Params (M) | FLOPs (M) | Top-1 / Top-5 Acc. (%) | Search Cost (GPU-hours) |
|---|---|---|---|---|
| EATNet-A | 5.1 | 563 | 74.7 / 92.0 | ≈856 |
| MnasNet | 4.2 | 317 | 74.0 / 91.8 | ≈91,000 |
| MobileNet-v2* | 6.9 | 585 | 74.7 / — | — |
| NASNet-A | 5.3 | 564 | 74.0 / 91.6 | ≈48,000 |
*MobileNet-v2-1.4× variant.
EATNet-A meets or exceeds the top-1 accuracy of all comparable models, matching MobileNet-v2-1.4× at a lower FLOPs count and utilizing approximately 100-fold less search compute compared to RL-based NAS.
7. Significance in Neural Architecture Search
EATNet-A exemplifies the capability of elastic architecture transfer to overcome the scalability and efficiency barriers inherent to NAS on large datasets. The mechanism’s ability to adapt all five core architecture primitives leads to models that are simultaneously accurate and computationally tractable. The empirical trade-off between search cost, inference efficiency, and accuracy evidenced by EATNet-A and its peer comparison demonstrates the effectiveness of EAT-NAS’s knowledge transfer paradigm in practical, large-scale, resource-constrained deep learning contexts (Fang et al., 2019).