- The paper demonstrates that the magnitude of architecture parameters in differentiable NAS like DARTS does not reliably indicate operation strength.
- It proposes a perturbation-based method to evaluate operation strength by observing the impact of masking operations on supernet validation accuracy.
- Applying this perturbation-based selection consistently improves architecture performance and reduces test error rates across various differentiable NAS methods compared to magnitude-based selection.
Overview of "Rethinking Architecture Selection in Differentiable NAS"
This paper addresses a critical assumption in differentiable Neural Architecture Search (NAS) methods, particularly DARTS, regarding architecture selection. Differentiable NAS, and DARTS specifically, have gained popularity due to their search efficiency and simplicity, where architecture parameters are optimized in parallel with model weights using gradient-based techniques. Conventionally, operations associated with the largest architecture parameters, α, are selected under the assumption that they represent operation strength.
Key Contributions and Findings
- Misalignment of α Values and Operation Strength: The authors demonstrate through empirical and theoretical analysis that the magnitude of architecture parameters does not reliably indicate the contribution of an operation to a supernet’s performance. This challenges the convention of using α values for architecture selection.
- Perturbation-Based Architecture Selection: The paper proposes an alternative method that evaluates operation strength based on its influence on the supernet. Specifically, this involves masking operations and observing the impact on validation accuracy, thereby identifying operations crucial to supernet performance without relying on α values.
- Improvement Across Differentiable NAS Methods: Applying perturbation-based selection improves architecture performance consistently in differentiable NAS methods, including standard DARTS, SDARTS, and SGAS variants. The paper provides numerical results demonstrating reduction in test error rates compared to magnitude-based selections.
Implications and Future Directions
The findings suggest that differentiable NAS methods could benefit from refocusing architecture selection criteria from parameter magnitude to direct performance contributions, potentially leading to more robust and effective architectures. This shift may alleviate known issues such as the empirical robustness problem in DARTS, where it often selects degenerate architectures with poor generalizability.
The paper opens avenues for further research into NAS optimization strategies, possibly exploring new methodologies for evaluating operation contributions that could replace or enhance existing bilevel optimization frameworks. Future AI developments could leverage these insights to refine neural architecture design processes, facilitating more reliable and efficient architecture searches with direct practical applications in commercial and academic domains.
Conclusion
Rethinking architecture selection in differentiable NAS, as introduced in this paper, provides a substantial advancement in the understanding and development of NAS methodologies. The proposed perturbation-based selection method not only challenges existing practices but also showcases how improved architecture performance can be achieved, paving the way for more explorational research into operation selection criteria within neural networks.