Rethinking Architecture Selection in Differentiable NAS (2108.04392v1)

Published 10 Aug 2021 in cs.LG and cs.CV

Abstract: Differentiable Neural Architecture Search is one of the most popular Neural Architecture Search (NAS) methods for its search efficiency and simplicity, accomplished by jointly optimizing the model weight and architecture parameters in a weight-sharing supernet via gradient-based algorithms. At the end of the search phase, the operations with the largest architecture parameters will be selected to form the final architecture, with the implicit assumption that the values of architecture parameters reflect the operation strength. While much has been discussed about the supernet's optimization, the architecture selection process has received little attention. We provide empirical and theoretical analysis to show that the magnitude of architecture parameters does not necessarily indicate how much the operation contributes to the supernet's performance. We propose an alternative perturbation-based architecture selection that directly measures each operation's influence on the supernet. We re-evaluate several differentiable NAS methods with the proposed architecture selection and find that it is able to extract significantly improved architectures from the underlying supernets consistently. Furthermore, we find that several failure modes of DARTS can be greatly alleviated with the proposed selection method, indicating that much of the poor generalization observed in DARTS can be attributed to the failure of magnitude-based architecture selection rather than entirely the optimization of its supernet.

Citations (162)

View on Semantic Scholar

Summary

The paper demonstrates that the magnitude of architecture parameters in differentiable NAS like DARTS does not reliably indicate operation strength.
It proposes a perturbation-based method to evaluate operation strength by observing the impact of masking operations on supernet validation accuracy.
Applying this perturbation-based selection consistently improves architecture performance and reduces test error rates across various differentiable NAS methods compared to magnitude-based selection.

Overview of "Rethinking Architecture Selection in Differentiable NAS"

This paper addresses a critical assumption in differentiable Neural Architecture Search (NAS) methods, particularly DARTS, regarding architecture selection. Differentiable NAS, and DARTS specifically, have gained popularity due to their search efficiency and simplicity, where architecture parameters are optimized in parallel with model weights using gradient-based techniques. Conventionally, operations associated with the largest architecture parameters, $\alpha$ , are selected under the assumption that they represent operation strength.

Key Contributions and Findings

Misalignment of $\alpha$ Values and Operation Strength: The authors demonstrate through empirical and theoretical analysis that the magnitude of architecture parameters does not reliably indicate the contribution of an operation to a supernet’s performance. This challenges the convention of using $\alpha$ values for architecture selection.
Perturbation-Based Architecture Selection: The paper proposes an alternative method that evaluates operation strength based on its influence on the supernet. Specifically, this involves masking operations and observing the impact on validation accuracy, thereby identifying operations crucial to supernet performance without relying on $\alpha$ values.
Improvement Across Differentiable NAS Methods: Applying perturbation-based selection improves architecture performance consistently in differentiable NAS methods, including standard DARTS, SDARTS, and SGAS variants. The paper provides numerical results demonstrating reduction in test error rates compared to magnitude-based selections.

Implications and Future Directions

The findings suggest that differentiable NAS methods could benefit from refocusing architecture selection criteria from parameter magnitude to direct performance contributions, potentially leading to more robust and effective architectures. This shift may alleviate known issues such as the empirical robustness problem in DARTS, where it often selects degenerate architectures with poor generalizability.

The paper opens avenues for further research into NAS optimization strategies, possibly exploring new methodologies for evaluating operation contributions that could replace or enhance existing bilevel optimization frameworks. Future AI developments could leverage these insights to refine neural architecture design processes, facilitating more reliable and efficient architecture searches with direct practical applications in commercial and academic domains.

Conclusion

Rethinking architecture selection in differentiable NAS, as introduced in this paper, provides a substantial advancement in the understanding and development of NAS methodologies. The proposed perturbation-based selection method not only challenges existing practices but also showcases how improved architecture performance can be achieved, paving the way for more explorational research into operation selection criteria within neural networks.