- The paper presents an early stopping method that mitigates DARTS collapse from excessive skip-connects and enhances neural network performance.
- It employs two criteria: halting when more than two skip-connects appear in a normal cell and when architecture parameter rankings stabilize.
- Experimental results demonstrate superior accuracy and reduced search time on datasets like CIFAR10, CIFAR100, Tiny-ImageNet, and ImageNet compared to standard DARTS.
Overview of "DARTS+: Improved Differentiable Architecture Search with Early Stopping"
In the field of Neural Architecture Search (NAS), Differentiable Architecture Search (DARTS) has emerged as a promising method. It offers a gradient-based, bi-level optimization technique to efficiently explore architectural possibilities. However, DARTS suffers from a performance "collapse" related to overfitting during extended search epochs, most notably seen in the surge of skip-connect operations in selected architectures. The paper introduces "DARTS+", a methodology that mitigates this collapse via an early stopping mechanism, leading to enhanced performance.
Collapse Issue in DARTS
The collapse problem in DARTS manifests when the search process results in architectures with excessive skip-connects, leading to shallow models and diminished performance. The origin of this issue lies in overfitting during the architecture search, where the model weights adapt too closely to the training data, causing a rise in skip-connects. This outcome reduces the expressive power of the resulting neural networks.
DARTS+ Approach
To counter the collapse, DARTS+ employs an early stopping criterion. Two criteria are proposed: (1) stopping the search when more than two skip-connects appear within a normal cell, and (2) stopping when the ranking of architecture parameters for learnable operations stabilizes. These criteria aim to prevent overfitting and select architectures before excessive skip-connect usage damages performance.
Experimental Validation
Extensive experiments across several datasets, including CIFAR10, CIFAR100, Tiny-ImageNet-200, and ImageNet, demonstrate the efficacy of DARTS+. The algorithm consistently outperforms standard DARTS, achieving test errors of 2.32% on CIFAR10 and 14.87% on CIFAR100, notably performing well on larger benchmarks like ImageNet.
Moreover, comparisons with existing methods reveal that DARTS+ succeeds in controlled architecture exploration, requiring less search time while achieving superior accuracy. The paper also shows that early stopping not only counters overfitting but also aligns more closely with natural learning phenomena, as initial architecture parameters are crucial in modulating search outcomes.
Implications and Future Directions
The findings underscore the need for control mechanisms in NAS to prevent overfitting. The role of early stopping in DARTS+ highlights a significant methodological advancement that could be adopted beyond DARTS to improve other NAS techniques. Future work might explore adaptive stopping criteria based on more dynamic and data-driven signals, potentially integrated with automated adjustment of architectural parameters based on intermediate evaluation results. Further research could also evaluate robustness to different model scales and investigate its applicability in real-world scenarios where resource constraints are paramount.
In conclusion, DARTS+ represents a pragmatic advancement in NAS methodology, demonstrating that seemingly simple modifications like early stopping can have substantial impacts on search efficiency and final model performance. Such insights contribute to the broader goal of automated, efficient design of neural architectures, paving the way for more reliable and powerful AI models.