Papers
Topics
Authors
Recent
Search
2000 character limit reached

AmoebaNet: Evolved Neural Architecture Search

Updated 5 February 2026
  • AmoebaNet is a family of evolved image classifiers that use a cell-based search space and aging evolution to automatically optimize architecture design.
  • The models feature normal and reduction cells configured with operations like separable convolutions and poolings to construct robust computation graphs.
  • Aging evolution in AmoebaNet promotes exploration and regularization, leading to architectures that rival or outperform hand-crafted designs on benchmarks like CIFAR-10 and ImageNet.

AmoebaNet refers to a family of image classifier architectures automatically discovered through a regularized (aging) evolutionary algorithm in a NASNet-style search space. AmoebaNet-A, in particular, was the first evolved neural architecture to surpass hand-crafted designs on large-scale image classification benchmarks, demonstrating state-of-the-art performance on ImageNet with superior or competitive efficiency relative to architectures discovered using reinforcement learning-guided neural architecture search (Real et al., 2018).

1. Architecture Search Space and Cell Composition

The search was conducted in the cell-based search space introduced for NASNet, where an image classifier comprises a small input stem, three stacks of NN identical "normal" cells, two reduction cells that halve the spatial resolution (placed after the first and second stacks), and a final pooling/softmax head.

In this framework, each cell is a directed acyclic graph carrying out exactly five "pairwise combinations." Each combination selects two hidden states (with replacement), applies to each an operation from the set

  • identity,
  • separable convolution (3×3, 5×5, 7×7),
  • average pooling 3×3,
  • max pooling 3×3,
  • dilated separable convolution 3×3,
  • 1×7→7×1 convolution, then sums the outputs to form a new hidden state.

After the five steps, there are 2 (initial inputs) + 5 (new states) = 7 hidden states; those that are never used as inputs to later combinations (typically two) are concatenated depth-wise to yield the cell's output. Normal cells preserve spatial dimensions, while reduction cells perform the same steps with stride 2.

2. Regularized (Aging) Evolution Algorithm

Aging evolution maintains a population of PP trained models, each with an implicit age (number of cycles survived). At each evolutionary cycle:

  • SS individuals are uniformly randomly sampled (with replacement) from the population,
  • The one with highest validation accuracy (the "tournament winner") is selected,
  • A child architecture is produced by mutating the winner,
  • The child is trained from scratch, its validation accuracy is measured and its age set to zero,
  • The child is inserted into the population and the oldest individual (largest age) is removed,
  • The ages of all remaining individuals are incremented by one.

This age-based culling prevents the population from being dominated by singular lucky models, enforcing continual exploration and implicit regularization by biasing toward architectures that retrain well, not just those with transient high validation scores. Standard non-aging evolution, in contrast, removes the lowest-accuracy model among the sampled set at each tournament.

The evolution is managed asynchronously, supporting large GPU or TPU clusters with minimal resource idle time, and controlled only by two meta-parameters: population size PP and tournament sample size SS.

3. Discovered Cell Topologies: AmoebaNet-A

AmoebaNet-A was identified by running 20,000 search-phase models (CIFAR-10; P=100P=100, S=25S=25), selecting the single best as the discovered architecture. The details of the normal and reduction cells are as follows:

AmoebaNet-A Normal Cell:

Step Operation 1 From Operation 2 From
1 separable conv 3×3 h₀ avg pool 3×3 h₁
2 separable conv 5×5 h₁ separable conv 7×7 h₀
3 max pool 3×3 h₀ separable conv 3×3 h₂
4 separable conv 7×7 h₂ separable conv 5×5 h₃
5 separable conv 3×3 h₁ max pool 3×3 h₄

Unused hidden states {h0,h2}\{h_0, h_2\} are concatenated as the cell output.

AmoebaNet-A Reduction Cell:

Step Operation 1 From Operation 2 From
1 separable conv 5×5 h₀ separable conv 3×3 h₁
2 avg pool 3×3 h₂ separable conv 7×7 h₀
3 separable conv 3×3 h₃ avg pool 3×3 h₂
4 max pool 3×3 h₁ separable conv 5×5 h₃
5 separable conv 7×7 h₄ separable conv 3×3 h₀

Unused hidden states {h1,h3}\{h_1, h_3\} are concatenated for the output.

A compact encoding for either cell is a tuple of five two-operation steps:

[(o1,1,i1,1;  o1,2,i1,2),  ,  (o5,1,i5,1;  o5,2,i5,2)].\bigl[(o_{1,1},\,i_{1,1};\;o_{1,2},\,i_{1,2}),\;\dots,\;(o_{5,1},\,i_{5,1};\;o_{5,2},\,i_{5,2})\bigr].

4. Hyper-parameters and Training Protocols

Search Phase

  • P=100P=100, S=25S=25, identity (no-op) mutation probability 0.05, otherwise equal probability for hidden-state vs. op mutation.
  • Hidden-state mutation: pick a cell and one operand in one of five combinations, rewire its source; op mutation: swap the operation for a random one in the 8-op set.
  • Model size: N=3N=3, F=24F=24.
  • Each architecture is trained for 25 epochs on CIFAR-10 ($45,000$ train, $5,000$ validation), batch size 128, SGD momentum 0.9, learning rate schedule comparable to RL baselines.
  • Search utilized 450 K40 GPUs in parallel for \sim20,000 models in 7 days; total  75,600~75,600 GPU-hours.

Final Training

  • After selection, architectures are scaled: larger (NN, FF), retrained with longer schedules and regularization (ScheduledDropPath p=0.7p=0.7; auxiliary softmax [weight 0.5 on CIFAR-10, 0.4 on ImageNet]; augmentations such as Cutout, AutoAugment).
  • CIFAR-10 final: N=6N=6, F=32F=32 (or F=36F=36), SGD momentum 0.9, weight decay 5×1045\times10^{-4}, initial lr=0.024lr=0.024 with cosine decay, 600 epochs, batch 128.
  • ImageNet final models: Medium: N=6N=6, F=190F=190 (\sim86.7M params); Large: N=6N=6, F=448F=448 (\sim469M params); optimizer: distributed RMSProp, decay 0.9, ϵ=0.1\epsilon=0.1, weight decay 4×1054\times 10^{-5}, initial lr=0.001lr=0.001 (decay 0.97 every 2 epochs), label smoothing 0.1, batch size \approx1024, 350 epochs, 100 P100 GPUs.

5. Empirical Performance and Cost

CIFAR-10 (with augmentation):

Model Params Test Error (%)
NASNet-A 3.3M 3.41
AmoebaNet-A (6×32) 2.6M 3.40±0.08
AmoebaNet-A (6×36) 3.2M 3.34±0.06

ImageNet (single-crop):

Model Params FLOPs Top-1 / Top-5 (%)
Inception-ResNet-V2 55.8M 13.2B 80.4 / 95.3
ResNeXt-101 (64×4d) 83.6M 31.5B 80.9 / 95.6
PolyNet 92.0M 34.7B 81.3 / 95.8
NASNet-A (RL) 88.9M 23.8B 82.7 / 96.2
PNASNet-5 86.1M 25.0B 82.9 / 96.2
AmoebaNet-A (6×190) 86.7M 23.1B 82.8 / 96.1
AmoebaNet-A (6×448) 469M 104B 83.9 / 96.6

Search compute cost: \sim20,000 model training jobs, each for 25 epochs (one K40 GPU)     \implies total \sim75,600 GPU-hours. Evolution reached 50% of its final accuracy in roughly half the time of an RL-based NAS controller, indicating efficiency especially when compute is constrained.

6. Analysis and Extensions

Aging evolution regularizes the evolutionary process by biasing toward candidate architectures that perform robustly upon retraining. By eliminating the oldest model at each cycle, architectures must repeatedly prove their accuracy, reducing the risk that the population is dominated by overfitted or lucky candidates.

This approach maintains exploratory diversity while rapidly exploiting high-accuracy models due to tournament selection. The asynchronous, steady-state loop is well-suited for distributed computing environments and requires only two meta-parameters (PP and SS), simplifying meta-optimization compared to reinforcement learning-based methods.

Applicability was tested primarily in NASNet-style search spaces and image classification. Small-scale tests on MNIST, grayscale CIFAR-10, and a miniature ImageNet also favored aging evolution. However, efficacy on tasks outside image classification (e.g., NLP, detection) or in more expansive search spaces was not established. Aging evolution was not combined with progressive model-size training or predictor-guided search, both of which could plausibly enhance search efficiency.

Understanding which cell motifs, such as high fan-in, correlate with higher final accuracy was identified as a significant future avenue.

7. Summary and Outlook

AmoebaNet-A demonstrated that regularized evolution is a simple and effective approach for neural architecture search, generating models that matched or exceeded the accuracy, wall-clock efficiency, and scalability of RL-guided NAS models. The mechanism, search space, and training protocol developed in (Real et al., 2018) set a new precedent for leveraging evolutionary algorithms in automatic architecture discovery, offering a method with minimal controller overhead and favorable parallelization properties. Further validation on non-vision tasks and integration with other NAS accelerants remain open directions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AmoebaNet Model.