- The paper introduces a curriculum learning paradigm that starts with extensive parameter sharing and progressively reduces it, enhancing performance estimation in one-shot NAS.
- It designs CLOSENet by decoupling parameters from operations using GLOW blocks and a GATE module, enabling flexible and adaptive architecture training.
- Empirical evaluations on NAS-Bench-201, NAS-Bench-301, and NDS ResNet benchmarks demonstrate that CLOSE consistently outperforms standard and improved one-shot NAS methods.
Overview of the CLOSE Framework for One-Shot NAS
The paper "CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS" proposes a novel methodology to enhance the one-shot Neural Architecture Search (NAS) framework by addressing the inefficiencies associated with parameter sharing among network architectures. This approach, termed Curriculum Learning on the Sharing Extent (CLOSE), is particularly designed to optimize the training of supernets by dynamically adjusting the extent of parameter sharing during the training process.
Technical Contributions
- Problem Identification: The primary issue identified is the poor correlation between one-shot performance estimations and stand-alone training results, primarily due to excessive parameter sharing among candidate architectures. Excessive sharing can prevent the supernet from effectively learning the distinct characteristics of individual architectures within the search space.
- Curriculum Learning Approach: CLOSE introduces a curriculum learning paradigm where the supernet begins training with a large parameter sharing extent—in which many architectures share the same parameters—thereby simplifying the initial training process. As training progresses, the sharing extent is reduced, allowing the model to fine-tune parameters more specifically for each architecture.
- CLOSENet Design: To facilitate CLOSE, the authors propose a novel supernet architecture, CLOSENet, which decouples the parameters from operations. This decoupling allows for a flexible and adaptive parameter sharing scheme that can be adjusted dynamically throughout training. The design incorporates GLOW blocks for parameter storage and a GATE module to assign these blocks to operations according to their architectural similarities, leveraging graph-based encoding.
- Strong Empirical Validation: The authors conducted extensive experiments on NAS benchmarks, such as NAS-Bench-201, NAS-Bench-301, and NDS ResNet spaces, demonstrating that CLOSE consistently achieves superior ranking quality across different experimental setups compared to both vanilla supernets and other improved one-shot methods like Few-shot NAS and K-shot NAS.
- Architectural Search Advancements: When incorporated with various search strategies, such as DARTS and SNAS, CLOSE results in discovering architectures with higher performance metrics. Moreover, it addresses common issues like the under-estimation within larger architectures, which is prevalent in earlier NAS approaches.
Implications and Future Directions
The paper contributes significantly to the field of one-shot NAS by proposing a practical approach to tackling the challenges of parameter sharing—balancing training efficiency with prediction accuracy. Its implementation of curriculum learning in parameter sharing could be generalized beyond NAS, potentially benefiting other model training paradigms encountering similar issues with parameter sharing or transfer learning.
For future directions, there is potential for more nuanced approaches to adjusting sharing schemes, possibly integrating automated feedback mechanisms to fine-tune the sharing extent dynamically based on real-time performance metrics. Further exploration into the adaptation of CLOSE in different types of search spaces, including non-topological and hybrid spaces, would also build on its applicability and robustness.
In conclusion, CLOSE represents an evolution in NAS methodologies, striking a balance between computational efficiency and model performance accuracy, thus paving the way for faster and more accurate neural architecture searches across varied domains.