Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search (1907.05737v4)

Published 12 Jul 2019 in cs.CV and cs.LG

Abstract: Differentiable architecture search (DARTS) provided a fast solution in finding effective network architectures, but suffered from large memory and computing overheads in jointly training a super-network and searching for an optimal architecture. In this paper, we present a novel approach, namely, Partially-Connected DARTS, by sampling a small part of super-network to reduce the redundancy in exploring the network space, thereby performing a more efficient search without comprising the performance. In particular, we perform operation search in a subset of channels while bypassing the held out part in a shortcut. This strategy may suffer from an undesired inconsistency on selecting the edges of super-net caused by sampling different channels. We alleviate it using edge normalization, which adds a new set of edge-level parameters to reduce uncertainty in search. Thanks to the reduced memory cost, PC-DARTS can be trained with a larger batch size and, consequently, enjoys both faster speed and higher training stability. Experimental results demonstrate the effectiveness of the proposed method. Specifically, we achieve an error rate of 2.57% on CIFAR10 with merely 0.1 GPU-days for architecture search, and a state-of-the-art top-1 error rate of 24.2% on ImageNet (under the mobile setting) using 3.8 GPU-days for search. Our code has been made available at: https://github.com/yuhuixu1993/PC-DARTS.

Citations (569)

Summary

  • The paper introduces a novel partial channel sampling method that reduces memory use and computational cost for architecture search.
  • It employs edge normalization to stabilize the stochastic search process and enhance training speed.
  • Experimental results show 2.57% error on CIFAR10 in 0.1 GPU-days and 24.2% top-1 error on ImageNet in 3.8 GPU-days.

Overview of PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search

PC-DARTS presents an innovative approach to differentiable architecture search (DARTS), addressing significant memory and computation inefficiencies inherent in traditional methods. The primary motivation is to enhance the efficiency of architecture search in neural networks while maintaining or improving performance.

Key Contributions

The paper introduces Partially-Connected DARTS (PC-DARTS), a novel methodology that reduces the redundancy in network search without sacrificing performance. The salient contributions are:

  1. Partial Channel Sampling: Instead of evaluating all channels in the network, a random subset of channels is sampled at each step. This reduces the memory and computational burden, allowing for larger batch sizes, hence increasing training stability and speed.
  2. Edge Normalization: To counteract inconsistencies caused by differing channel selections in the partially connected network, edge normalization is proposed. It introduces additional parameters at the edge level to lower uncertainty in architecture search.

Numerical Results

PC-DARTS demonstrates its efficacy with strong experimental results:

  • Achieves an impressive error rate of 2.57% on CIFAR10 with just 0.1 GPU-days.
  • On ImageNet, PC-DARTS records a state-of-the-art top-1 error rate of 24.2% under the mobile setting, requiring only 3.8 GPU-days.

These results highlight a marked improvement over prior DARTS methods, particularly in reducing search time.

Implications and Future Directions

The implications of PC-DARTS are substantial:

  • Practical Implications: The enhanced efficiency in architecture search makes PC-DARTS attractive for real-world applications, where computational resources are often a limiting factor.
  • Theoretical Implications: This work suggests a shift in perspective regarding the dependency on full channel evaluations in DARTS. By demonstrating that a subset selection can yield robust architectures, PC-DARTS invites further exploration into stochastic methods in network architecture search.
  • Future Developments: As a pioneering strategy in channel sampling within NAS, PC-DARTS opens avenues for integrating similar strategies into other search algorithms. Future research might delve into optimal sampling strategies and how these affect the broader NAS performance landscape.

Conclusion

PC-DARTS offers a significant step forward in the field of neural architecture search by explicitly tackling the inefficiencies in computation and memory usage. Through partial channel connections and edge normalization, it establishes a new standard for efficiency and stability in NAS, paving the way for future explorations in the stochastic and reduced complexity approaches to neural architecture design.