Understanding and Robustifying Differentiable Architecture Search (1909.09656v2)

Published 20 Sep 2019 in cs.LG, cs.AI, cs.CV, and stat.ML

Abstract: Differentiable Architecture Search (DARTS) has attracted a lot of attention due to its simplicity and small search costs achieved by a continuous relaxation and an approximation of the resulting bi-level optimization problem. However, DARTS does not work robustly for new problems: we identify a wide range of search spaces for which DARTS yields degenerate architectures with very poor test performance. We study this failure mode and show that, while DARTS successfully minimizes validation loss, the found solutions generalize poorly when they coincide with high validation loss curvature in the architecture space. We show that by adding one of various types of regularization we can robustify DARTS to find solutions with less curvature and better generalization properties. Based on these observations, we propose several simple variations of DARTS that perform substantially more robustly in practice. Our observations are robust across five search spaces on three image classification tasks and also hold for the very different domains of disparity estimation (a dense regression task) and LLMling.

Citations (349)

View on Semantic Scholar

Summary

The paper identifies DARTS failure modes and demonstrates their impact on architecture generalization across 12 NAS benchmarks.
It reveals a strong correlation between the dominant eigenvalue of the Hessian and poor test performance, highlighting critical curvature issues.
The study proposes regularization techniques, including early stopping and modified optimization, to robustify DARTS for diverse tasks.

Insights on Differentiable Architecture Search: Challenges and Robustification

The paper "Understanding and Robustifying Differentiable Architecture Search" addresses several critical issues associated with Differentiable Architecture Search (DARTS), a method that improves the efficiency of Neural Architecture Search (NAS). While DARTS significantly reduces the computational overhead associated with NAS, making it feasible for various challenging tasks, it also exhibits notable inconsistencies, particularly in its robustness across different search spaces and problems.

Main Contributions

The paper provides several key contributions to understanding and enhancing DARTS:

Identification of DARTS Failure Modes: The research identifies multiple scenarios and search spaces where DARTS yields suboptimal architectures, highlighted by degenerate architectures with poor generalization performance. Twelve NAS benchmarks were utilized to demonstrate these weaknesses.
Correlation Between Curvature and Generalization Performance: Through extensive empirical analysis, the paper shows a strong correlation between the dominant eigenvalue of the Hessian of the validation loss in architecture space and the architecture's test performance. Architectures identified by DARTS tend to have high validation loss curvature, leading to poor generalization.
Proposals for Robustification: To mitigate these issues, the authors propose the application of regularization strategies to stabilize DARTS, aiming at solutions with reduced curvature and improved generalization. This includes introducing early stopping criteria and modifying the inner optimization process to favor architectures with intrinsically lower curvature metrics.
Evaluation Across Diverse Tasks and Domains: The suggested modifications display enhanced robustness across various search spaces for tasks ranging from image classification to LLMing and dense regression tasks such as disparity estimation.

Implications

Practical Implications

The practical implications of this work are profound for practitioners in AI and machine learning, specifically in the area of automated machine learning (AutoML). By addressing the robustness of DARTS, the proposed solutions pave the way for its broader application to more diverse and complex tasks in real-world scenarios. Reducing the selection of degenerate architectures builds confidence in the consistency and reliability of DARTS and similar NAS methods, which is critical for their adoption in production environments.

Theoretical Implications

From a theoretical perspective, this paper highlights the importance of understanding the dynamics of the search space in NAS methods. The correlation between the Hessian's eigenvalues and architecture performance introduces a new dimension in analyzing NAS efficiency and effectiveness, emphasizing the geometric properties of the architecture search landscape.

Future Directions

The authors’ insights open multiple avenues for future research. One potential direction is the exploration of automated regularization tuning mechanisms within NAS frameworks to reduce manual intervention and improve adaptivity to various datasets and tasks. Furthermore, extending similar analyses to other NAS methodologies could unveil generalizable principles that enhance NAS performance across different strategies.

In conclusion, this paper not only provides a detailed critique of DARTS but also proposes practical adaptations that enhance its robustness. By dissecting the effects of search space curvature and proposing regularization-based solutions, the authors substantiate the impacts of their methodologies with strong empirical support across diverse tasks, setting a new standard for robustness in differentiable NAS methods.

PDF Markdown