Genetic CNN (1703.01513v1)

Published 4 Mar 2017 in cs.CV

Abstract: The deep Convolutional Neural Network (CNN) is the state-of-the-art solution for large-scale visual recognition. Following basic principles such as increasing the depth and constructing highway connections, researchers have manually designed a lot of fixed network structures and verified their effectiveness. In this paper, we discuss the possibility of learning deep network structures automatically. Note that the number of possible network structures increases exponentially with the number of layers in the network, which inspires us to adopt the genetic algorithm to efficiently traverse this large search space. We first propose an encoding method to represent each network structure in a fixed-length binary string, and initialize the genetic algorithm by generating a set of randomized individuals. In each generation, we define standard genetic operations, e.g., selection, mutation and crossover, to eliminate weak individuals and then generate more competitive ones. The competitiveness of each individual is defined as its recognition accuracy, which is obtained via training the network from scratch and evaluating it on a validation set. We run the genetic process on two small datasets, i.e., MNIST and CIFAR10, demonstrating its ability to evolve and find high-quality structures which are little studied before. These structures are also transferrable to the large-scale ILSVRC2012 dataset.

Citations (807)

View on Semantic Scholar

Summary

The paper introduces a genetic algorithm that encodes CNN architectures as fixed-length binary strings for automated design.
It employs selection, mutation, and crossover operations to efficiently search a vast network design space.
Experiments demonstrate that the evolved architectures achieve low error rates on MNIST and CIFAR-10 and perform competitively on ILSVRC2012.

Genetic CNN: Automated Deep Network Design Using Genetic Algorithms

"Lingxi Xie and Alan Yuille's paper, "Genetic CNN," proposes an innovative approach to automatic network design through the application of genetic algorithms (GAs). This paper tackles a significant challenge in the design of Convolutional Neural Networks (CNNs), aiming to automate and optimize the architecture design process traditionally reliant on manual configuration.

The authors introduce a method for encoding network structures into fixed-length binary strings, facilitating the application of GAs for structure optimization. This encoded representation allows the exploration of a large space of potential architectures efficiently, leveraging genetic operations such as selection, mutation, and crossover.

Key Contributions and Methodology

Binary Encoding of Network Structures:
- The authors propose an encoding scheme where each network structure is represented by a fixed-length binary string. This encoding method allows the network layers and connections to be systematically mutated and evolved.
Genetic Algorithm Application:
- Selection: Individuals (network architectures) are selected based on their fitness, determined by convolutional network accuracy.
- Mutation: Random alterations are applied to the binary string encoding.
- Crossover: Binary strings of two individuals are combined to produce new offspring, merging architectural traits of parent networks.
Experimental Validation:
- The genetic process was tested on small datasets like MNIST and CIFAR-10. Networks optimized via GAs consistently outperformed traditional manually designed networks.
- The learned structures demonstrated strong generalization, producing competitive results when transferred to the large-scale ILSVRC2012 dataset.

Experimental Results

MNIST and CIFAR-10

For MNIST, the genetic algorithm initialized with network architectures containing up to 3 and 5 nodes per stage, respectively. After 50 generations, the optimized network achieved a recognition error rate of 0.34%, indicating the efficiency of the GA in evolving network structures.

On CIFAR-10, the genetic process utilized an extended network encoding, with up to 5 nodes per stage across three stages. Key metrics like maximum, minimum, average, and median accuracies improved significantly over the generations, demonstrating the genetic algorithm's capability to explore and optimize a vast architecture space.

Transferability to Large-Scale Datasets

The optimal structures discovered on CIFAR-10 were transferred to the ILSVRC2012 dataset. The findings illustrated the robustness and adaptability of the genetic approach. With about 22 layers, the resulting networks achieved a top-5 error rate of around 9.7%, comparable to deeper manually designed and more complex networks like VGGNet and ResNet.

Implications and Future Directions

Practical Implications:
- The automation of network design via GAs can democratize access to state-of-the-art architectures, reducing reliance on expert manual tuning.
- This method provides a scalable and efficient strategy to tackle evolving network requirements across different datasets and tasks.
Theoretical Insights:
- The genetic algorithm’s ability to discover non-traditional and highly effective network structures highlights the potential for bio-inspired optimization techniques in deep learning.
- Future research could explore expanding the encoding scheme to include more diverse layer types and network components.
Future Developments:
- Integration of more complex genetic operations could potentially improve the exploration efficiency and quality of discovered structures.
- Concurrent evolution of both network structures and training methodologies might lead to even further performance enhancements.
- Application and validation across a broader range of tasks, including non-vision applications, would establish the generality of this approach.

Conclusion

The paper "Genetic CNN" by Xie and Yuille provides a comprehensive framework for automatically designing CNN architectures using genetic algorithms. The results underscore the capacity of GAs to navigate extensive architectural design spaces efficiently, presenting an effective alternative to the manual design of deep learning models. The proposed method's success across both small and large-scale datasets signifies a considerable step towards more autonomous and adaptive AI systems.

PDF Markdown

Related Papers

YouTube

Show All Videos