Neural Architecture Search using Property Guided Synthesis (2205.03960v3)

Published 8 May 2022 in cs.LG and cs.PL

Abstract: In the past few years, neural architecture search (NAS) has become an increasingly important tool within the deep learning community. Despite the many recent successes of NAS, however, most existing approaches operate within highly structured design spaces, and hence explore only a small fraction of the full search space of neural architectures while also requiring significant manual effort from domain experts. In this work, we develop techniques that enable efficient NAS in a significantly larger design space. To accomplish this, we propose to perform NAS in an abstract search space of program properties. Our key insights are as follows: (1) the abstract search space is significantly smaller than the original search space, and (2) architectures with similar program properties also have similar performance; thus, we can search more efficiently in the abstract search space. To enable this approach, we also propose a novel efficient synthesis procedure, which accepts a set of promising program properties, and returns a satisfying neural architecture. We implement our approach, $\alpha$NAS, within an evolutionary framework, where the mutations are guided by the program properties. Starting with a ResNet-34 model, $\alpha$NAS produces a model with slightly improved accuracy on CIFAR-10 but 96% fewer parameters. On ImageNet, $\alpha$NAS is able to improve over Vision Transformer (30% fewer FLOPS and parameters), ResNet-50 (23% fewer FLOPS, 14% fewer parameters), and EfficientNet (7% fewer FLOPS and parameters) without any degradation in accuracy.

Authors (3)

Charles Jin (7 papers)
Phitchaya Mangpo Phothilimthana (11 papers)
Sudip Roy (12 papers)

Citations (5)

View on Semantic Scholar

Summary

This paper introduces αNAS, a novel Neural Architecture Search (NAS) method that operates in an abstract search space defined by program properties, rather than directly manipulating the concrete computation graphs of neural networks. The core idea is that searching in this smaller abstract space is more efficient, and architectures with similar properties tend to have similar performance characteristics, providing structure to the search.

Methodology: Property-Guided Evolutionary Search

αNAS employs an evolutionary framework where mutations happen at the property level:

Select Subgraph: Randomly choose a subgraph within an existing neural network architecture.
Infer Properties: Statically analyze the selected subgraph to determine its properties. The key properties defined are:
- Shape Property: The output tensor shape(s) given the input shape(s). Inferred using framework tools like JAX's shape inference.
- Depth Property: The maximum number of alternating linear and non-linear operations along any path from an input to an output within the subgraph. Operations are pre-classified as linear or non-linear.
- Mixing Property: Captures the expressivity of the subgraph viewed as a linear operator, composed of:
  - Pairing: Indicates if an input dimension significantly contributes to an output dimension across a slice. Represented as a matrix.
  - Locality: Describes how elements within an input dimension contribute to a single output element (one-to-one, many-to-one, all-to-one).
  - Inference: Concretely inferred using gradients (auto-differentiation). Abstractly inferred by composing property matrices using a defined matrix multiplication over property values {x, o, m, a}.
Mutate Properties: Apply stochastic mutations to the inferred properties. Typically, properties are relaxed (e.g., removing a pairing constraint) to ensure the original subgraph satisfies the mutated properties, making the synthesis task feasible. The depth property might be increased or decreased.
Synthesize Subgraph: Use a novel progressive synthesis algorithm to generate a new subgraph that satisfies the mutated properties.

Progressive Synthesis Algorithm

This algorithm efficiently generates a subgraph matching target properties:

Goal: Find a sequence of primitive operations (like convolution, activation, pooling) that, when composed, satisfy the target shape, depth, and mixing properties.
Mechanism: It iteratively appends primitive operations. At each step, it selects an operation that reduces a defined distance between the current partial subgraph's properties and the target properties.
Distance Functions: Defined for each property (e.g., for depth, distance is max(0, target_depth - current_depth)). The total distance is the sum of individual property distances.
Covering Set: A pre-defined subset of primitive operations ( $T$ ) is used. This set is chosen such that for any state where the distance is non-zero, at least one operation in $T$ can reduce the distance (ensuring progress). The property definitions and covering set are designed such that operations typically improve one property without significantly worsening others (monotonicity).
Efficiency: Runs in time linear in the length of the synthesized subgraph and the size of the covering set, significantly faster than naive enumerative search (which is exponential).
Stochasticity: A variant introduces randomness in operation selection (proportional to distance reduction) to increase diversity.
Search Space Compression: Uses a smaller set of representative operations during synthesis (e.g., one type of pooling) and diversifies the specific parameters (e.g., pool size, kernel size) in a post-processing step.

Implementation Details

Evolutionary Framework: Uses regularized evolution, selecting parents based on Pareto optimality across multiple objectives (e.g., accuracy vs. FLOPS, accuracy vs. parameters).
Mutation Targets: Operates on predefined "blocks" within architectures (e.g., ResNet bottlenecks, ViT attention/MLP blocks). Mutations can involve subgraph replacement within a block, block deletion, or block duplication.
Environment: Implemented using JAX and run on TPUs.

Experimental Results

CIFAR-10: Starting from ResNet-34, αNAS found a model with slightly better accuracy but 96% fewer parameters.
ImageNet:
- Improved ResNet-50: Same accuracy with 23% fewer GFLOPS and 14% fewer parameters.
- Improved Vision Transformer (ViT-S/16): Same accuracy with 30% fewer GFLOPS and parameters. Found non-trivial mutations replacing standard blocks with more efficient structures.
- Improved EfficientNet-B0: Same accuracy with 7-8% fewer GFLOPS and parameters. Another model increased accuracy by ~1.7% with fewer parameters but more FLOPS.
Ablation Studies:
- Showed properties are crucial: Random subgraph replacement performed poorly.
- Demonstrated the efficiency gain of progressive synthesis over naive enumeration (30x faster synthesis time).
- Confirmed all three properties (shape, depth, mixing) contribute positively.
Comparison to Baselines: Outperformed search mechanisms inspired by Primer and AutoML-Zero (which use smaller, concrete-space mutations) in finding Pareto optimal models on CIFAR-10 starting from ResNet-34. Also showed significantly better compression than a compiler-optimization-based NAS approach.

Practical Implications

Provides a way to perform NAS in less structured, potentially larger search spaces compared to methods relying on predefined cell structures.
The property-guided synthesis allows for larger, more meaningful mutations per step compared to single-operation edits, potentially leading to faster convergence and escaping local optima.
The static nature of property inference and the efficiency of progressive synthesis decouple the expensive model training/evaluation from the search proposal step.
The defined properties (shape, depth, mixing) offer a structured way to reason about and guide the modification of network architectures based on functional characteristics.
The progressive synthesis algorithm is a general technique applicable to goal-directed synthesis problems where distance metrics and monotonic covering sets can be defined.

Code: The implementation is available on GitHub.

PDF Markdown

Related Papers

Find Related Papers