MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations (2105.07085v2)

Published 14 May 2021 in cs.CV

Abstract: Most existing deep neural networks are static, which means they can only do inference at a fixed complexity. But the resource budget can vary substantially across different devices. Even on a single device, the affordable budget can change with different scenarios, and repeatedly training networks for each required budget would be incredibly expensive. Therefore, in this work, we propose a general method called MutualNet to train a single network that can run at a diverse set of resource constraints. Our method trains a cohort of model configurations with various network widths and input resolutions. This mutual learning scheme not only allows the model to run at different width-resolution configurations but also transfers the unique knowledge among these configurations, helping the model to learn stronger representations overall. MutualNet is a general training methodology that can be applied to various network structures (e.g., 2D networks: MobileNets, ResNet, 3D networks: SlowFast, X3D) and various tasks (e.g., image classification, object detection, segmentation, and action recognition), and is demonstrated to achieve consistent improvements on a variety of datasets. Since we only train the model once, it also greatly reduces the training cost compared to independently training several models. Surprisingly, MutualNet can also be used to significantly boost the performance of a single network, if dynamic resource constraint is not a concern. In summary, MutualNet is a unified method for both static and adaptive, 2D and 3D networks. Codes and pre-trained models are available at \url{https://github.com/taoyang1122/MutualNet}.

Citations (22)

View on Semantic Scholar

Summary

The paper demonstrates that MutualNet significantly outperforms its counterparts by achieving higher accuracy with fewer FLOPs.
The study shows that incorporating KL-divergence in training leads to modest accuracy improvements in specific configurations.
The findings indicate MutualNet's efficient architecture is well-suited for resource-constrained applications like mobile and embedded systems.

An Analysis of MutualNet Performance

The paper under consideration provides a comparative statistical analysis of the MutualNet architecture against other versatile neural network models, specifically US-Net and I-Net. The central focus of the document is to evaluate the performance implications of MutualNet in terms of accuracy and computational efficiency, measured in floating point operations per second (FLOPs), across various configurations and model sizes.

Performance Metrics

The authors present a detailed tabulation of the accuracy (%) achieved by each model configuration for given computational efforts, i.e., FLOPs. In the supplied data:

MutualNet consistently outperforms the US-Net and I-Net models in most configurations presented.
Notably, MutualNet at configuration 1.0-224 achieves an accuracy of 72.4% and 72.9% with 569 FLOPs and 300 FLOPs, respectively, surpassing its counterparts, US-Net, and I-Net in these configurations.

A particular point of interest emerges when examining models with lower computational budgets. For example, MutualNet reaches an accuracy of 50.1% with only 21 FLOPs in the 0.25-160 configuration, whereas US-Net only records 33.8%. This reflects a significant increase in efficiency for tasks constrained by computational resources.

Implications of KL-Divergence

Another noteworthy aspect of the research is the evaluation of incorporating Kullback-Leibler (KL) divergence into the training regimen. Results indicate modest improvements in accuracy when KL-divergence is applied:

For instance, in the 0.95-224 configuration with 518 FLOPs, accuracy improved by 0.2% with KL-integration.

Comparative Analysis with I-Net

The comparison with I-Net additionally underscores MutualNet's effectiveness. In configuration 1.0-224 with 4089 FLOPs, MutualNet achieves an impressive accuracy of 78.1%, whereas I-Net reaches only 76.4%. Furthermore, in resource-constrained environments (e.g., 0.75-128 with 15 FLOPs), MutualNet maintains higher accuracy (56.5%) relative to I-Net (48.3%).

Theoretical and Practical Implications

The empirical analysis provided by this paper indicates that MutualNet not only maintains an edge in accuracy across numerous test cases but also manages to do so with similar or reduced computational overhead. The findings present significant implications for practical deployments where computational efficiency and performance must be balanced, such as in mobile or embedded systems. The results also suggest a potential reconsideration of architectural choices in situations where state-of-the-art neural networks may face constraints regarding power and processing capabilities.

Speculation on Future Developments

Given the quantitative advantages demonstrated by MutualNet, future research endeavors could capitalize on this architecture by exploring its scalability across larger datasets or its adaptability to other neural tasks beyond those presented. Additionally, the interplay of KL-divergence suggests an opportunity to further optimize learning dynamics, potentially fostering new avenues in model training methodologies.

In conclusion, this analysis distinctly showcases the efficacy of MutualNet over its competitors across a plethora of operational scales, emphasizing both theoretical advancements and practical applications in the design of efficient machine learning models.

Related Papers

GitHub

GitHub - taoyang1122/MutualNet: [ECCV'20 Oral] MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution (158 stars)

Tweets

https://twitter.com/izariuo440/status/1286097005157130240