Large-Scale Evolution of Image Classifiers (1703.01041v2)

Published 3 Mar 2017 in cs.NE, cs.AI, cs.CV, and cs.DC

Abstract: Neural networks have proven effective at solving difficult problems but designing their architectures can be challenging, even for image classification problems alone. Our goal is to minimize human participation, so we employ evolutionary algorithms to discover such networks automatically. Despite significant computational requirements, we show that it is now possible to evolve models with accuracies within the range of those published in the last year. Specifically, we employ simple evolutionary techniques at unprecedented scales to discover models for the CIFAR-10 and CIFAR-100 datasets, starting from trivial initial conditions and reaching accuracies of 94.6% (95.6% for ensemble) and 77.0%, respectively. To do this, we use novel and intuitive mutation operators that navigate large search spaces; we stress that no human participation is required once evolution starts and that the output is a fully-trained model. Throughout this work, we place special emphasis on the repeatability of results, the variability in the outcomes and the computational requirements.

Citations (1,564)

View on Semantic Scholar

Summary

The paper demonstrates that evolutionary algorithms can autonomously design competitive neural network architectures with minimal human intervention.
The methodology integrates weight inheritance and parallel training, achieving 94.6% accuracy on CIFAR-10 and 77.0% on CIFAR-100.
The study underscores evolutionary strategies' potential to democratize high-performance model design and reduce manual engineering.

Large-Scale Evolution of Image Classifiers

The paper "Large-Scale Evolution of Image Classifiers" by Real et al. explores the application of evolutionary algorithms (EAs) to automate the discovery and optimization of neural network architectures for image classification tasks, particularly on the CIFAR-10 and CIFAR-100 datasets. The paper demonstrates that EAs can yield high-performing neural networks without human intervention once the evolutionary process is initiated.

Overview

The conventional method of designing neural networks architecture requires substantial human expertise and iterative experimentation. This paper's primary objective is to reduce human involvement by employing evolutionary algorithms at an unprecedented computational scale to discover and train image classification models. The networks obtained through these methods achieve accuracies within the range of those achieved by recently published hand-designed models.

Methodology

The authors employ a population-based EA where each individual in the population represents a neural network architecture encoded as a directed acyclic graph. The edges of the graph consist of convolutional layers and identity connections, while vertices represent activation functions such as batch-normalized ReLUs or linear units. The population evolves through selection, reproduction, and mutation processes.

The evolutionary process is managed as follows:

Selection: Random pairs of individuals are chosen to compete based on their fitness, measured by validation accuracy. The poorer-performing individual is removed from the population.
Mutation: Various mutations are applied to the winning individual, including modifying learning rates, inserting or removing convolutional layers or skip connections, and altering numerical parameters such as stride and filter size.
Training and Validation: Each network is trained for a fixed number of steps using SGD with momentum. Fitness is evaluated on a validation set, which drives selection.

EAs are scaled using a lock-free, massively parallel infrastructure with asynchronous operations. Workers independently select, mutate, and train individuals, interfacing via a shared file system. This approach enables what the authors claim is an unparalleled scale in the context of neuro-evolution.

Results

The evolved models demonstrated competitive performance on standard datasets:

CIFAR-10: Achieved test accuracy of 94.6% with a single model and 95.6% with an ensemble, using approximately $4 \times 10^{20}$ FLOPs.
CIFAR-100: Achieved test accuracy of 77.0% using $2 \times 10^{20}$ FLOPs without algorithmic modifications.

The variability among experiment runs was contained, with the standard deviation of test accuracies being 0.4% on CIFAR-10. The authors also explored the effects of population size and training steps per individual on the evolutionary outcomes, finding that both increasing population size and training duration positively affected model accuracy.

Analysis

A noteworthy finding is that incorporating weight inheritance during mutations allowed the evolving networks to leverage previously learned weights, significantly accelerating the convergence towards high-performing architectures. This contrasts with traditional views in the deep learning community, which often considers evolutionary approaches less capable compared to hand-designed models.

The authors conducted controls to validate their approach:

Random Search Control: Demonstrated significantly lower performance, illustrating the importance of the evolutionary process.
Disabled Weight Inheritance: Resulted in reduced final accuracy, underscoring the value of inherited weights.

Furthermore, increasing mutation rates and using weight resetting were shown to help the populations escape local optima, indicating effective mechanisms for enhancing exploration in the search space.

Implications and Future Directions

This paper's findings suggest several theoretical and practical implications:

Theoretical: The success of the EA in discovering competitive network architectures independently validates the potential of evolutionary strategies in machine learning, challenging the presumption that such methods are inherently inferior.
Practical: The described approach could democratize access to high-performance models by reducing the need for exhaustive manual architecture design, allowing more focus on problem-specific modifications and analysis.

Future work might investigate more efficient algorithms for model search and training, more sophisticated mutation and recombination strategies, and combinatory approaches integrating EAs with other hyper-parameter optimization techniques. Moreover, enhancing the computational efficiency while maintaining or improving performance will remain a pivotal research direction to render these methods more viable for widespread application.

In summary, "Large-Scale Evolution of Image Classifiers" provides compelling evidence that evolutionary algorithms, when scaled adequately and designed thoughtfully, can autonomously discover high-performance neural network architectures, presenting a significant stride towards fully automated machine learning model design.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (8)

YouTube

Show All Videos