The paper "einspace: Searching for Neural Architectures from Fundamental Operations" (Ericsson et al., 31 May 2024
) introduces a novel neural architecture search (NAS) space called einspace
. The core motivation is that existing NAS search spaces are often too rigid and based on high-level operations, limiting the potential for discovering fundamentally new architectures beyond current paradigms like CNNs and Transformers.
The authors propose einspace
as a highly expressive search space built from more fundamental operations and defined by a parameterised probabilistic context-free grammar (PCFG). This grammar allows for the flexible composition of operations and structures, enabling the representation of diverse architectures like ResNets, Transformers, and MLP-Mixers within a single space.
Key components of einspace
include:
- Fundamental Operations: These are the basic building blocks, categorized into four groups based on their function:
- Branching: One-to-many functions for directing information flow (e.g.,
clone
,group-dim
). - Aggregation: Many-to-one functions for merging tensors (e.g.,
matmul
,add
,concat
). - Routing: One-to-one functions for reshaping or reordering tensors without altering information content (e.g.,
identity
,im2col
,col2im
,permute
). - Computation: One-to-one functions for altering tensor information (e.g.,
linear
,norm
,relu
,softmax
,pos-enc
).
- Branching: One-to-many functions for directing information flow (e.g.,
- Macro Structure: Fundamental operations are composed into modules that take one input tensor and produce one output tensor, potentially with internal branching. Four module types are defined: Sequential, Branching, Routing, and Computation. These recursive definitions, inherent to CFGs, allow for hierarchical architecture construction.
- Feature Mode: To accommodate different architectural styles like ConvNets (operating on 3D image features) and Transformers (operating on 2D token features),
einspace
introduces two modes: Im mode (3D) and Col mode (2D). Certain routing functions (im2col
,col2im
) manage transitions between these modes, ensuring compatibility of operations. - Parameterized Grammar: Parameters are integrated into the grammar rules to enforce structural validity, such as matching tensor shapes between sequential operations or aligning branching and aggregation factors. The sampling process ensures valid parameter combinations are selected.
- Probabilistic Grammar: Probabilities are assigned to production rules to control the complexity and expected depth of sampled architectures. The probability of choosing the terminal Computation module is crucial for ensuring finite derivations, preventing the generation of infinitely large networks.
For practical implementation, the authors define a standard network structure consisting of a generated backbone followed by a predefined head module (e.g., global pooling and a linear layer for classification). The framework allows different search strategies to navigate the einspace
.
The experimental evaluation is conducted on the Unseen NAS benchmark [geada_insights_2024], which includes diverse tasks across vision, language, audio, and chess modalities, and a subset of the NAS-Bench-360 [tu_nas-bench-360_2022] benchmark. The authors compare simple search strategies: Random Sampling, Random Search, and Regularised Evolution (RE). They explore RE starting from scratch, and also seeded with existing strong architectures like ResNet18, WideResNet-16-4, ViT, and MLP-Mixer.
Key experimental findings include:
- Unlike less expressive spaces where random search is often competitive, simple Random Search performs poorly in
einspace
, highlighting the need for more sophisticated search strategies in expressive spaces. - Regularised Evolution starting from scratch significantly outperforms random search.
- Initializing the RE search with existing state-of-the-art architectures (RE(RN18) and RE(Mix)) consistently leads to performance improvements over the initial models across all tested datasets.
- On some datasets, particularly those less aligned with traditional computer vision (like Language and NinaPro),
einspace
searches find architectures that surpass existing NAS benchmarks and even human-designed expert models. - The experiments demonstrate the ability of
einspace
to discover diverse and novel architectural patterns.
The paper acknowledges limitations, primarily the large size and unbounded nature of the search space, which makes efficient one-shot NAS methods difficult. Ambiguity in derivation trees and potential redundant compositions are also noted.
In conclusion, the paper successfully introduces einspace
, an expressive and versatile NAS search space based on a parameterised PCFG that can represent a wide range of existing architectures and facilitate the discovery of new ones from fundamental operations. The results demonstrate that even simple evolutionary search strategies, especially when initialized with strong priors, can find competitive architectures across diverse tasks, suggesting promising avenues for future research into more intelligent search methods within this flexible space.