Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Efficient Model Compression via Learned Global Ranking (1904.12368v2)

Published 28 Apr 2019 in cs.CV and cs.LG

Abstract: Pruning convolutional filters has demonstrated its effectiveness in compressing ConvNets. Prior art in filter pruning requires users to specify a target model complexity (e.g., model size or FLOP count) for the resulting architecture. However, determining a target model complexity can be difficult for optimizing various embodied AI applications such as autonomous robots, drones, and user-facing applications. First, both the accuracy and the speed of ConvNets can affect the performance of the application. Second, the performance of the application can be hard to assess without evaluating ConvNets during inference. As a consequence, finding a sweet-spot between the accuracy and speed via filter pruning, which needs to be done in a trial-and-error fashion, can be time-consuming. This work takes a first step toward making this process more efficient by altering the goal of model compression to producing a set of ConvNets with various accuracy and latency trade-offs instead of producing one ConvNet targeting some pre-defined latency constraint. To this end, we propose to learn a global ranking of the filters across different layers of the ConvNet, which is used to obtain a set of ConvNet architectures that have different accuracy/latency trade-offs by pruning the bottom-ranked filters. Our proposed algorithm, LeGR, is shown to be 2x to 3x faster than prior work while having comparable or better performance when targeting seven pruned ResNet-56 with different accuracy/FLOPs profiles on the CIFAR-100 dataset. Additionally, we have evaluated LeGR on ImageNet and Bird-200 with ResNet-50 and MobileNetV2 to demonstrate its effectiveness. Code available at https://github.com/cmu-enyac/LeGR.

Overview of the Paper: Efficient Model Compression via Learned Global Ranking

The paper, "Towards Efficient Model Compression via Learned Global Ranking," presents a novel approach to convolutional network (ConvNet) compression, specifically targeting the optimization of accuracy and speed trade-offs. This research is pertinent in the context of embodied AI applications, such as autonomous robots and drones, where computational resources are frequently limited. The cornerstone of the approach is an algorithm, termed Learned Global Ranking (LeGR), designed to produce multiple ConvNet architectures with varied accuracy and latency profiles by dynamically ranking convolutional filters across layers.

Key Contributions and Methodology

Researchers have proposed many methods for model compression, primarily focusing on filter pruning, whereby redundant or less critical convolutional filters are removed to enhance speed without significantly compromising accuracy. However, most traditional methods face challenges when users need to specify a targeted model complexity or optimize for a generalized application where both speed and accuracy are crucial. The existing approaches typically necessitate extensive trial-and-error to arrive at a suitable model complexity, an impractical endeavor in many settings.

LeGR seeks to address these challenges by eschewing the singular focus on a predetermined model complexity, instead generating a spectrum of ConvNets that afford different balances between accuracy and speed. The main innovations introduced are as follows:

  1. Global Ranking of Filters: Unlike past methods that rank filters within each layer, LeGR evaluates the importance of filters globally across the network, employing learned layer-wise affine transformations on filter norms. This approach is demonstrably efficient, offering a way to create ConvNets of varying complexities from a single learned global filter ranking.
  2. Subset Assumption: The authors introduce a "subset assumption," whereby a smaller pruned ConvNet is considered a subset of larger ConvNets, facilitating the learning of a global ranking applicable across ConvNets of different scales.
  3. Efficiency and Practicality: LeGR significantly reduces the computational overhead traditionally associated with iterative pruning and fine-tuning. Empirical results show that the algorithm is 2-3 times faster than prior methodologies while maintaining or improving accuracy metrics. Experiments conducted on the CIFAR-10/100, ImageNet, and Birds-200 datasets demonstrate the competitive performance of LeGR, highlighting its practical applicability.

Empirical Results and Implications

Experiments were primarily evaluated on widely recognized datasets like CIFAR-100, ImageNet, and Birds-200 with architectures such as ResNet and MobileNetV2. Key findings can be summarized as follows:

  • In comparative benchmarks against existing methods like MorphNet, AMC, FisherPruning, and uniform pruning approaches, LeGR consistently outperformed or matched the state-of-the-art, particularly in low-FLOP-count regimes.
  • The methodology demonstrated the ability to produce an effective and efficient Pareto curve (accuracy vs. FLOP/latency trade-off), simplifying the selection process for developers needing to deploy AI models space-efficiently on constrained devices.
  • With respect to transfer learning applications, where models pre-trained on large datasets (e.g., ImageNet) are adapted to smaller datasets, LeGR provided robust performance improvements.
  • The approach outlined offers practical implications for developers deploying AI in real-world applications, enabling more efficient exploration of model configurations suitable for varied computational and performance constraints.

Future Directions and Conclusions

The LeGR approach opens up several avenues for future research, including potential adaptations to other network architectures and exploring the integration of learned rankings in automated neural architecture search frameworks. Additionally, further work may extend the current method to handle adaptive or real-time adjustments in deployed environments where model requirements change dynamically.

By shifting the perspective from a singular optimized model to a set of models embodying different trade-offs, the methodology outlined in this paper could significantly contribute to developing scalable, flexible AI systems suited to the demands of contemporary and future embodied AI applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ting-Wu Chin (14 papers)
  2. Ruizhou Ding (13 papers)
  3. Cha Zhang (23 papers)
  4. Diana Marculescu (64 papers)
Citations (159)