Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size (1602.07360v4)

Published 24 Feb 2016 in cs.CV and cs.AI

Abstract: Recent research on deep neural networks has focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify multiple DNN architectures that achieve that accuracy level. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1) Smaller DNNs require less communication across servers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To provide all of these advantages, we propose a small DNN architecture called SqueezeNet. SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, with model compression techniques we are able to compress SqueezeNet to less than 0.5MB (510x smaller than AlexNet). The SqueezeNet architecture is available for download here: https://github.com/DeepScale/SqueezeNet

An Expert's Perspective on "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size"

The paper "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size" presents a significant advancement in the architecture of convolutional neural networks (CNNs), addressing the critical challenge of model size without sacrificing accuracy. This work is of particular importance given the increasing deployment of CNNs in resource-constrained environments, such as autonomous vehicles and embedded systems.

Key Contributions and Findings

The primary contribution of the paper is the introduction of SqueezeNet, a novel small CNN architecture that achieves AlexNet-level accuracy on the ImageNet dataset while using 50 times fewer parameters. SqueezeNet employs three main strategies to achieve this feat:

  1. Using 1x1 Filters: Replacing 3x3 filters with 1x1 filters significantly reduces the number of parameters, given that a 1x1 filter has nine times fewer parameters than a 3x3 filter.
  2. Decreased Input Channels to 3x3 Filters: The introduction of "squeeze layers" reduces the number of input channels to the 3x3 filters, thereby decreasing the total parameter count.
  3. Delayed Downsampling: Performing downsampling operations later in the network ensures that convolution layers have larger activation maps, contributing to higher classification accuracy.

SqueezeNet architecture is composed primarily of Fire modules, which consist of a squeeze layer followed by an expand layer containing a mix of 1x1 and 3x3 filters. By combining these design principles, SqueezeNet maintains the accuracy comparable to AlexNet while reducing the model size to 4.8MB. Furthermore, by employing model compression techniques such as Deep Compression, the model size can be further reduced to less than 0.5MB, demonstrating a 510-fold reduction compared to AlexNet.

Strong Numerical Results

The empirical results underscore the efficacy of SqueezeNet:

  • Baseline SqueezeNet Model: Achieves 57.5% top-1 accuracy and 80.3% top-5 accuracy on the ImageNet dataset with a model size of only 4.8MB.
  • Compressed SqueezeNet Model: With Deep Compression, the model size is reduced to 0.66MB and further to 0.47MB, retaining the same accuracy levels. This represents a substantial compression ratio of up to 510 times compared to the original AlexNet.

Additionally, the paper explores several design dimensions:

  • Squeeze Ratio (SR): Adjusting the ratio between squeeze and expand filters impacts both model size and accuracy. A higher SR increases accuracy up to a point before plateauing.
  • Filter Dimensionality: Trading 1x1 and 3x3 filters within the expand layers illuminates that a balanced use of both types of filters yields the best accuracy without excessively increasing model size.

Implications and Future Directions

The practical implications of SqueezeNet are profound, particularly for deploying CNNs in environments with limited computational resources and bandwidth:

  • Distributed Training: Smaller models reduce the communication overhead during distributed training, enhancing scalability.
  • On-chip Deployment: The significantly reduced model size enables feasible on-chip deployment on FPGAs and ASICs, avoiding memory bandwidth bottlenecks and potentially reducing power consumption.
  • Over-the-air Updates: For applications in autonomous driving, smaller models minimize the bandwidth required for over-the-air updates, facilitating more frequent and agile improvements in deployed models.

Theoretically, this work pushes the boundaries of CNN architecture design, emphasizing the importance of a disciplined architecture exploration to balance parameter efficiency and performance. Future research could further investigate and refine design space exploration techniques, potentially integrating automated neural architecture search (NAS) with principles borrowed from SqueezeNet.

Conclusion

"SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size" contributes a valuable perspective on CNN architecture design, demonstrating the viability of small yet powerful CNNs. The methodological approach combines architectural innovation with compression techniques to achieve remarkable efficiency, laying the groundwork for broader applications in AI, where computational resources are constrained. The findings suggest extensive possibilities in optimizing and deploying compact CNNs across various domains, heralding a future with more efficient, scalable, and accessible deep learning models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Forrest N. Iandola (6 papers)
  2. Song Han (155 papers)
  3. Matthew W. Moskewicz (4 papers)
  4. Khalid Ashraf (6 papers)
  5. William J. Dally (21 papers)
  6. Kurt Keutzer (200 papers)
Citations (7,130)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com