Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SlimSAM: 0.1% Data Makes Segment Anything Slim (2312.05284v4)

Published 8 Dec 2023 in cs.CV

Abstract: Current approaches for compressing the Segment Anything Model (SAM) yield commendable results, yet necessitate extensive data to train a new network from scratch. Employing conventional pruning techniques can remarkably reduce data requirements but would suffer from a degradation in performance. To address this challenging trade-off, we introduce SlimSAM, a novel data-efficient SAM compression method that achieves superior performance with extremely less training data. The essence of SlimSAM is encapsulated in the alternate slimming framework which effectively enhances knowledge inheritance under severely limited training data availability and exceptional pruning ratio. Diverging from prior techniques, our framework progressively compresses the model by alternately pruning and distilling distinct, decoupled sub-structures. Disturbed Taylor pruning is also proposed to address the misalignment between the pruning objective and training target, thereby boosting the post-distillation after pruning. SlimSAM yields significant performance improvements while demanding over 10 times less training data than any other existing compression methods. Even when compared to the original SAM, SlimSAM achieves approaching performance while reducing parameter counts to merely 1.4% (9.1M), MACs to 0.8% (23G), and requiring only 0.1% (10k) of the SAM training data. The code is available at http://github.com/czg1225/SlimSAM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Learning efficient object detection models with knowledge distillation. Advances in neural information processing systems, 30, 2017.
  2. Rsprompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. arXiv preprint arXiv:2306.16269, 2023.
  3. Towards efficient model compression via learned global ranking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1518–1528, 2020.
  4. Centripetal sgd for pruning very deep convolutional networks with complicated structure. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4943–4953, 2019.
  5. Learning to prune deep neural networks via layer-wise optimal brain surgeon. Advances in neural information processing systems, 30, 2017.
  6. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  7. Depgraph: Towards any structural pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16091–16101, 2023.
  8. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2018.
  9. Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015.
  10. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision, pages 1389–1397, 2017.
  11. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  12. 3d-llm: Injecting the 3d world into large language models. arXiv preprint arXiv:2307.12981, 2023.
  13. Segment anything in non-euclidean domains: Challenges and opportunities. arXiv preprint arXiv:2304.11595, 2023.
  14. Yolo by ultralytics, 2023.
  15. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  16. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  17. A signal propagation perspective for pruning neural networks at initialization. arXiv preprint arXiv:1906.06307, 2019.
  18. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.
  19. Expediting large-scale vision transformer for dense prediction without fine-tuning. Advances in Neural Information Processing Systems, 35:35462–35477, 2022.
  20. Group fisher pruning for practical network compression. In International Conference on Machine Learning, pages 7021–7032. PMLR, 2021.
  21. Any-to-any style transfer: Making picasso and da vinci collaborate. arXiv e-prints, pages arXiv–2304, 2023.
  22. Structured knowledge distillation for semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2604–2613, 2019.
  23. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision, pages 2736–2744, 2017.
  24. Can sam boost video super-resolution? arXiv preprint arXiv:2305.06524, 2023.
  25. Llm-pruner: On the structural pruning of large language models. arXiv preprint arXiv:2305.11627, 2023.
  26. Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. arXiv preprint arXiv:1711.05852, 2017.
  27. Importance estimation for neural network pruning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11264–11272, 2019.
  28. Zero-shot knowledge distillation in deep networks. In International Conference on Machine Learning, pages 4743–4751. PMLR, 2019.
  29. Lookahead: A far-sighted alternative of magnitude-based pruning. arXiv preprint arXiv:2002.04809, 2020.
  30. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  31. Movement pruning: Adaptive sparsity by fine-tuning. Advances in Neural Information Processing Systems, 33:20378–20389, 2020.
  32. Anything-3d: Towards single-view anything reconstruction in the wild. arXiv preprint arXiv:2304.10261, 2023.
  33. Patient knowledge distillation for bert model compression. arXiv preprint arXiv:1908.09355, 2019.
  34. Multilingual neural machine translation with knowledge distillation. arXiv preprint arXiv:1902.10461, 2019.
  35. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  36. Kdgan: Knowledge distillation with generative adversarial networks. Advances in neural information processing systems, 31, 2018.
  37. Medical sam adapter: Adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620, 2023.
  38. Tinyvit: Fast pretraining distillation for small vision transformers. In European Conference on Computer Vision, pages 68–85. Springer, 2022.
  39. Knowledge distillation meets self-supervision. In European Conference on Computer Vision, pages 588–604. Springer, 2020.
  40. Global vision transformer pruning with hessian-aware saliency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18547–18557, 2023a.
  41. Track anything: Segment anything meets videos. arXiv preprint arXiv:2304.11968, 2023b.
  42. Sam3d: Segment anything in 3d scenes. arXiv preprint arXiv:2306.03908, 2023c.
  43. Gate decorator: Global filter pruning method for accelerating deep convolutional neural networks. Advances in neural information processing systems, 32, 2019.
  44. Inpaint anything: Segment anything meets image inpainting. arXiv preprint arXiv:2304.06790, 2023.
  45. Faster segment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv:2306.14289, 2023.
  46. Decoupled knowledge distillation. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 11953–11962, 2022.
  47. Fast segment anything. arXiv preprint arXiv:2306.12156, 2023.
Citations (6)

Summary

  • The paper introduces SlimSAM, a framework that compresses the Segment Anything Model using only 0.1% of the original training data.
  • It employs an alternate slimming framework with disturbed Taylor pruning to balance model size reduction and segmentation performance.
  • Experimental results show significant parameter and computational reductions, enabling deployment on resource-constrained devices.

SlimSAM: A Data-Efficient Approach to Compression of Segment Anything Model

The paper, titled "SlimSAM: 0.1% Data Makes Segment Anything Slim," introduces an innovative approach to compress the Segment Anything Model (SAM) with significantly reduced training data. The work addresses the challenge of maintaining model performance while minimizing computational and data requirements, offering a practical solution for deploying SAM on resource-limited devices.

Key Contributions

The central contribution of this work is the SlimSAM framework, which significantly reduces the training data needed—only 0.1% of the original SAM dataset. The authors propose an alternate slimming framework that alternates between pruning and distillation, effectively managing the trade-off between model size and performance. This approach is augmented by a novel pruning method called disturbed Taylor pruning, which aligns pruning objectives with training targets to enhance post-distillation recovery.

Technical Approach

  • Alternate Slimming Framework: The framework divides the SAM structure into decoupled components—embedding and bottleneck dimensions—and applies pruning sequentially. This process aids in minimizing the deviation from the original model and allows for efficient intermediate feature alignment.
  • Disturbed Taylor Pruning: This method estimates parameter importance based on soft label divergence. By introducing Gaussian noise, the framework generates non-zero gradients, facilitating effective importance estimation without hard labels. This refinement resolves the traditional misalignment between pruning objectives and training targets.

Experimental Results

SlimSAM demonstrates remarkable improvements over existing methodologies, achieving significant reductions in both parameter count and computational requirements. Notably, SlimSAM reduces the original SAM parameters to 1.4% (9.1M) and MACs to 0.8% (23G), while requiring only 10k unlabelled images for training—over ten times less data compared to other compression techniques.

Comparative Analysis

The results highlight SlimSAM's superior performance and efficiency. When tested against other SAM compression methods, SlimSAM consistently yields higher Mean Intersection over Union (MIoU) scores, particularly under severe data constraints. Its performance surpasses existing models like FastSAM, MobileSAM, and EfficientSAM, which require much more substantial datasets for training.

Implications and Future Work

The implications of this research are substantial for the deployment of SAM-based models in environments where data and computational resources are constrained. The SlimSAM approach not only preserves the robust segmentation capabilities of SAM but also opens avenues for its application on edge devices.

Future work could explore extending the SlimSAM framework to other large-scale models and further refining the pruning-distillation process to achieve even greater efficiency. Moreover, investigating the potential of SlimSAM in various real-world scenarios could provide additional insights into its practical applications.

Conclusion

This paper presents a significant advancement in model compression, enabling the widespread applicability of SAM through an innovative, data-efficient method. The combination of alternate slimming and disturbed Taylor pruning offers a powerful toolkit for not only compressing large models but also ensuring their performance remains intact with minimal training resources.

X Twitter Logo Streamline Icon: https://streamlinehq.com