- The paper presents a unified framework that integrates neural architecture search with program transformations, achieving over 3x speedup in DNN inference.
- It employs Fisher Potential as a compile-time metric to ensure the legality of transformations without extensive retraining.
- Implementation in TVM demonstrates reduced search times and notable performance gains across various DNN models and hardware platforms.
The paper "Neural Architecture Search as Program Transformation Exploration" by Jack Turner, Elliot J. Crowley, and Michael F.P. O'Boyle investigates a novel approach to enhancing the efficiency of deep neural networks (DNNs) by integrating concepts from both the compiler community and the domain of neural architecture search (NAS). The authors propose a unified framework that views neural architecture operations as program transformations, thus allowing the merging of traditional compiler optimizations with architecture-based transformations for improving DNN performance.
Key Contributions
- Unified Framework: The paper introduces a methodology to integrate NAS techniques with compilation optimizations through the expression of neural architecture operations as program transformations. Such unification has the potential to streamline the deployment process by interleaving program and neural architecture transformations, thereby unlocking optimization opportunities not previously available.
- Transformation Legality via Fisher Potential: A significant challenge addressed by the authors is the legality of neural transformations. Unlike traditional program transformations bound by data dependences, the legality of NAS transformations is judged by their representational capacity. Fisher Potential is employed as a metric to gauge this capacity, offering a compile-time, cost-effective way to filter out damaging network changes without the need for extensive retraining.
- Implementation in TVM: The proposed framework has been prototyped in the TVM optimizing compiler, where various transformations, including grouping and bottlenecking, were systematically explored. The authors demonstrated that, by applying these combined transformations, significant reductions in inference time can be achieved across various DNN architectures and platforms.
- Empirical Results: The research reports empirical results where the proposed methodology achieved more than 3x speedup in inference times on major DNN models like ResNet, ResNext, and DenseNet across four hardware platforms. Additionally, the unified framework notably reduced the search time in neural architecture search by discarding unfit configurations early in the process.
Background and Methodology
The authors highlight two distinct yet related domains involved in optimizing neural network performance: the community focusing on neural architecture search that concerns itself with finding efficient network models, and compiler developers who optimize fixed networks to exploit hardware capabilities. By viewing neural architecture operations through the lens of program transformations, the paper proposes a systematic exploration of these transformations as part of a broader optimization strategy.
The polyhedral model, renowned for its use in optimizing loop nests and memory access patterns in compilers, is extended to encompass NAS transformations. Bottlenecking, grouping, and depthwise convolutions are expressed as affine transformations within this model, bridging compiler optimizations with neural network structural changes. The authors leverage Fisher Potential from the domain of learning theory as a practical measure to assess the fidelity of transformations without necessitating network retraining.
Implications and Future Research
This approach has substantial implications for both the practical deployment and theoretical development of AI technologies. The unified framework provides a pathway for developing more agile and efficient neural models that optimize resources while preserving accuracy. Future work might extend this framework to broader classes of neural architectures, beyond convolutional networks, and explore alternative proxies for representational capacity that could further enhance the exploration space's efficacy.
Moreover, this research invites further refinement of Fisher Potential or similar metrics to become more predictive of post-training accuracies in varied neural architectures. The integration of such measures could make NAS substantially more efficient, contributing to the growing trend towards automating network design and optimization processes in deep learning frameworks.
In summary, this paper advances the idea that systematic transformation exploration unifies two traditionally siloed areas, providing a comprehensive strategy for the optimization of DNNs, pushing the boundaries for future AI development and application.