DARTS+: Improved Differentiable Architecture Search with Early Stopping (1909.06035v2)

Published 13 Sep 2019 in cs.CV and cs.LG

Abstract: Recently, there has been a growing interest in automating the process of neural architecture design, and the Differentiable Architecture Search (DARTS) method makes the process available within a few GPU days. However, the performance of DARTS is often observed to collapse when the number of search epochs becomes large. Meanwhile, lots of "{\em skip-connect}s" are found in the selected architectures. In this paper, we claim that the cause of the collapse is that there exists overfitting in the optimization of DARTS. Therefore, we propose a simple and effective algorithm, named "DARTS+", to avoid the collapse and improve the original DARTS, by "early stopping" the search procedure when meeting a certain criterion. We also conduct comprehensive experiments on benchmark datasets and different search spaces and show the effectiveness of our DARTS+ algorithm, and DARTS+ achieves $2.32\%$ test error on CIFAR10, $14.87\%$ on CIFAR100, and $23.7\%$ on ImageNet. We further remark that the idea of "early stopping" is implicitly included in some existing DARTS variants by manually setting a small number of search epochs, while we give an {\em explicit} criterion for "early stopping".

Citations (269)

View on Semantic Scholar

Summary

The paper presents an early stopping method that mitigates DARTS collapse from excessive skip-connects and enhances neural network performance.
It employs two criteria: halting when more than two skip-connects appear in a normal cell and when architecture parameter rankings stabilize.
Experimental results demonstrate superior accuracy and reduced search time on datasets like CIFAR10, CIFAR100, Tiny-ImageNet, and ImageNet compared to standard DARTS.

Overview of "DARTS+: Improved Differentiable Architecture Search with Early Stopping"

In the field of Neural Architecture Search (NAS), Differentiable Architecture Search (DARTS) has emerged as a promising method. It offers a gradient-based, bi-level optimization technique to efficiently explore architectural possibilities. However, DARTS suffers from a performance "collapse" related to overfitting during extended search epochs, most notably seen in the surge of skip-connect operations in selected architectures. The paper introduces "DARTS+", a methodology that mitigates this collapse via an early stopping mechanism, leading to enhanced performance.

Collapse Issue in DARTS

The collapse problem in DARTS manifests when the search process results in architectures with excessive skip-connects, leading to shallow models and diminished performance. The origin of this issue lies in overfitting during the architecture search, where the model weights adapt too closely to the training data, causing a rise in skip-connects. This outcome reduces the expressive power of the resulting neural networks.

DARTS+ Approach

To counter the collapse, DARTS+ employs an early stopping criterion. Two criteria are proposed: (1) stopping the search when more than two skip-connects appear within a normal cell, and (2) stopping when the ranking of architecture parameters for learnable operations stabilizes. These criteria aim to prevent overfitting and select architectures before excessive skip-connect usage damages performance.

Experimental Validation

Extensive experiments across several datasets, including CIFAR10, CIFAR100, Tiny-ImageNet-200, and ImageNet, demonstrate the efficacy of DARTS+. The algorithm consistently outperforms standard DARTS, achieving test errors of 2.32% on CIFAR10 and 14.87% on CIFAR100, notably performing well on larger benchmarks like ImageNet.

Moreover, comparisons with existing methods reveal that DARTS+ succeeds in controlled architecture exploration, requiring less search time while achieving superior accuracy. The paper also shows that early stopping not only counters overfitting but also aligns more closely with natural learning phenomena, as initial architecture parameters are crucial in modulating search outcomes.

Implications and Future Directions

The findings underscore the need for control mechanisms in NAS to prevent overfitting. The role of early stopping in DARTS+ highlights a significant methodological advancement that could be adopted beyond DARTS to improve other NAS techniques. Future work might explore adaptive stopping criteria based on more dynamic and data-driven signals, potentially integrated with automated adjustment of architectural parameters based on intermediate evaluation results. Further research could also evaluate robustness to different model scales and investigate its applicability in real-world scenarios where resource constraints are paramount.

In conclusion, DARTS+ represents a pragmatic advancement in NAS methodology, demonstrating that seemingly simple modifications like early stopping can have substantial impacts on search efficiency and final model performance. Such insights contribute to the broader goal of automated, efficient design of neural architectures, paving the way for more reliable and powerful AI models.