Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS (2207.07868v1)

Published 16 Jul 2022 in cs.CV

Abstract: One-shot Neural Architecture Search (NAS) has been widely used to discover architectures due to its efficiency. However, previous studies reveal that one-shot performance estimations of architectures might not be well correlated with their performances in stand-alone training because of the excessive sharing of operation parameters (i.e., large sharing extent) between architectures. Thus, recent methods construct even more over-parameterized supernets to reduce the sharing extent. But these improved methods introduce a large number of extra parameters and thus cause an undesirable trade-off between the training costs and the ranking quality. To alleviate the above issues, we propose to apply Curriculum Learning On Sharing Extent (CLOSE) to train the supernet both efficiently and effectively. Specifically, we train the supernet with a large sharing extent (an easier curriculum) at the beginning and gradually decrease the sharing extent of the supernet (a harder curriculum). To support this training strategy, we design a novel supernet (CLOSENet) that decouples the parameters from operations to realize a flexible sharing scheme and adjustable sharing extent. Extensive experiments demonstrate that CLOSE can obtain a better ranking quality across different computational budget constraints than other one-shot supernets, and is able to discover superior architectures when combined with various search strategies. Code is available at https://github.com/walkerning/aw_nas.

Citations (12)

Summary

  • The paper introduces a curriculum learning paradigm that starts with extensive parameter sharing and progressively reduces it, enhancing performance estimation in one-shot NAS.
  • It designs CLOSENet by decoupling parameters from operations using GLOW blocks and a GATE module, enabling flexible and adaptive architecture training.
  • Empirical evaluations on NAS-Bench-201, NAS-Bench-301, and NDS ResNet benchmarks demonstrate that CLOSE consistently outperforms standard and improved one-shot NAS methods.

Overview of the CLOSE Framework for One-Shot NAS

The paper "CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS" proposes a novel methodology to enhance the one-shot Neural Architecture Search (NAS) framework by addressing the inefficiencies associated with parameter sharing among network architectures. This approach, termed Curriculum Learning on the Sharing Extent (CLOSE), is particularly designed to optimize the training of supernets by dynamically adjusting the extent of parameter sharing during the training process.

Technical Contributions

  1. Problem Identification: The primary issue identified is the poor correlation between one-shot performance estimations and stand-alone training results, primarily due to excessive parameter sharing among candidate architectures. Excessive sharing can prevent the supernet from effectively learning the distinct characteristics of individual architectures within the search space.
  2. Curriculum Learning Approach: CLOSE introduces a curriculum learning paradigm where the supernet begins training with a large parameter sharing extent—in which many architectures share the same parameters—thereby simplifying the initial training process. As training progresses, the sharing extent is reduced, allowing the model to fine-tune parameters more specifically for each architecture.
  3. CLOSENet Design: To facilitate CLOSE, the authors propose a novel supernet architecture, CLOSENet, which decouples the parameters from operations. This decoupling allows for a flexible and adaptive parameter sharing scheme that can be adjusted dynamically throughout training. The design incorporates GLOW blocks for parameter storage and a GATE module to assign these blocks to operations according to their architectural similarities, leveraging graph-based encoding.
  4. Strong Empirical Validation: The authors conducted extensive experiments on NAS benchmarks, such as NAS-Bench-201, NAS-Bench-301, and NDS ResNet spaces, demonstrating that CLOSE consistently achieves superior ranking quality across different experimental setups compared to both vanilla supernets and other improved one-shot methods like Few-shot NAS and K-shot NAS.
  5. Architectural Search Advancements: When incorporated with various search strategies, such as DARTS and SNAS, CLOSE results in discovering architectures with higher performance metrics. Moreover, it addresses common issues like the under-estimation within larger architectures, which is prevalent in earlier NAS approaches.

Implications and Future Directions

The paper contributes significantly to the field of one-shot NAS by proposing a practical approach to tackling the challenges of parameter sharing—balancing training efficiency with prediction accuracy. Its implementation of curriculum learning in parameter sharing could be generalized beyond NAS, potentially benefiting other model training paradigms encountering similar issues with parameter sharing or transfer learning.

For future directions, there is potential for more nuanced approaches to adjusting sharing schemes, possibly integrating automated feedback mechanisms to fine-tune the sharing extent dynamically based on real-time performance metrics. Further exploration into the adaptation of CLOSE in different types of search spaces, including non-topological and hybrid spaces, would also build on its applicability and robustness.

In conclusion, CLOSE represents an evolution in NAS methodologies, striking a balance between computational efficiency and model performance accuracy, thus paving the way for faster and more accurate neural architecture searches across varied domains.

Github Logo Streamline Icon: https://streamlinehq.com