Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Path-Level Network Transformation for Efficient Architecture Search (1806.02639v1)

Published 7 Jun 2018 in cs.LG, cs.AI, and stat.ML

Abstract: We introduce a new function-preserving transformation for efficient neural architecture search. This network transformation allows reusing previously trained networks and existing successful architectures that improves sample efficiency. We aim to address the limitation of current network transformation operations that can only perform layer-level architecture modifications, such as adding (pruning) filters or inserting (removing) a layer, which fails to change the topology of connection paths. Our proposed path-level transformation operations enable the meta-controller to modify the path topology of the given network while keeping the merits of reusing weights, and thus allow efficiently designing effective structures with complex path topologies like Inception models. We further propose a bidirectional tree-structured reinforcement learning meta-controller to explore a simple yet highly expressive tree-structured architecture space that can be viewed as a generalization of multi-branch architectures. We experimented on the image classification datasets with limited computational resources (about 200 GPU-hours), where we observed improved parameter efficiency and better test results (97.70% test accuracy on CIFAR-10 with 14.3M parameters and 74.6% top-1 accuracy on ImageNet in the mobile setting), demonstrating the effectiveness and transferability of our designed architectures.

Path-Level Network Transformation for Efficient Architecture Search

The paper presents a novel approach to neural architecture search (NAS) by proposing a method termed Path-Level Network Transformation. This innovation specifically targets the limitations of traditional layer-level transformations by focusing on path-level topological modifications, which enables more efficient architecture search while maintaining the ability to reuse pre-trained network weights. The paper integrates this method into a reinforcement learning framework to explore a tree-structured architecture space effectively.

Methodology Summary

Path-Level Network Transformation: The central innovation of the paper lies in the function-preserving network transformation at the path level. These operations extend the scope beyond layer-wise modifications, allowing the transformation of network path topology while preserving pre-trained weights. This is crucial for complex architectures like Inception models where multi-path connections are prevalent.

Reinforcement Learning Framework: The transformation operations are integrated with a bidirectional tree-structured reinforcement learning (RL) meta-controller. This setup exploits a tree-structured architecture space, providing a generalized view of multi-branch structures. The RL meta-controller is responsible for exploring this search space, dynamically sampling architectures, and evaluating their performance. The use of tree-structured LSTMs facilitates the encoding of input architectures in a manner that naturally corresponds to the hierarchical nature of network topologies.

Experimental Results

The paper reports empirical evaluations primarily conducted on CIFAR-10 and ImageNet datasets, showcasing significant improvements in architecture search efficiency and model performance. With restricted computational resources (approximately 200 GPU-hours), the architecture discovered using the proposed method achieved competitive accuracy on CIFAR-10 — 97.70% with 14.3M parameters and 74.6% top-1 accuracy on ImageNet in a mobile setting. Notably, these results are achieved while using a fraction of the computational resources required by other NAS approaches, such as those reported by Zoph et al., which utilized 48,000 GPU-hours.

Implications and Future Directions

Parameter Efficiency and Transferability: The capability to discover architectures with high parameter efficiency was demonstrated by improvements over existing DenseNets and PyramidNets. The architecture has shown enhanced effectiveness and, more importantly, transferability across different models, underscoring the generality of the path-level transformations.

Theoretical Implications: Theoretical implications include the broadening of architecture search spaces to include diverse path topologies. This empowers the NAS framework to explore beyond traditional chain-structured networks, which could lead to discovering novel architectural insights.

Future Developments: The fusion of the proposed transformation framework with network compression techniques holds potential for further advancements. Future work could explore reducing model complexity without sacrificing performance, which is beneficial for deploying NAS-derived models in resource-constrained environments.

In conclusion, the paper provides an exciting advancement in the development of NAS techniques, specifically highlighting the significance of path-level transformations. By leveraging a tree-structured representation and bidirectional RL controllers, the proposed approach enhances both the efficiency and quality of neural architecture design, setting a promising foundation for future research in automated model development.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Han Cai (79 papers)
  2. Jiacheng Yang (11 papers)
  3. Weinan Zhang (322 papers)
  4. Song Han (155 papers)
  5. Yong Yu (219 papers)
Citations (203)