Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 148 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Towards Complementary Knowledge Distillation for Efficient Dense Image Prediction (2401.13174v4)

Published 24 Jan 2024 in cs.CV

Abstract: It has been revealed that small efficient dense image prediction (EDIP) models, trained using the knowledge distillation (KD) framework, encounter two key challenges, including maintaining boundary region completeness and preserving target region connectivity, despite their favorable capacity to recognize main object regions. In this work, we propose a complementary boundary and context distillation (BCD) method within the KD framework for EDIPs, which facilitates the targeted knowledge transfer from large accurate teacher models to compact efficient student models. Specifically, the boundary distillation component focuses on extracting explicit object-level semantic boundaries from the hierarchical feature maps of the backbone network to enhance the student model's mask quality in boundary regions. Concurrently, the context distillation component leverages self-relations as a bridge to transfer implicit pixel-level contexts from the teacher model to the student model, ensuring strong connectivity in target regions. Our proposed BCD method is specifically designed for EDIP tasks and is characterized by its simplicity and efficiency. Extensive experimental results across semantic segmentation, object detection, and instance segmentation on various representative datasets demonstrate that our method can outperform existing methods without requiring extra supervisions or incurring increased inference costs, resulting in well-defined object boundaries and smooth connecting regions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (101)
  1. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In CVPR, pages 4981–4990, 2018.
  2. Weakly supervised learning of instance segmentation with inter-pixel relations. In CVPR, pages 2209–2218, 2019.
  3. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE T-PAMI, 39(12):2481–2495, 2017.
  4. Convolutional random walk networks for semantic image segmentation. In CVPR, pages 858–866, 2017.
  5. Coco-stuff: Thing and stuff classes in context. In CVPR, pages 1209–1218, 2018.
  6. Swin-unet: Unet-like pure transformer for medical image segmentation. In ECCV, pages 205–218, 2022.
  7. Emerging properties in self-supervised vision transformers. In ICCV, pages 9650–9660, 2021.
  8. Learning efficient object detection models with knowledge distillation. In NeurIPS, 2017a.
  9. Transunet: Transformers make strong encoders for medical image segmentation. arXiv, 2021.
  10. Weakly supervised semantic segmentation with boundary exploration. In ECCV, pages 347–362, 2020.
  11. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE T-PAMI, 40(4):834–848, 2017b.
  12. Rethinking atrous convolution for semantic image segmentation. arXiv, 2017c.
  13. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, pages 801–818, 2018.
  14. Improved feature distillation via projector ensemble. In NeurIPS, pages 12084–12095, 2022.
  15. Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation. In CVPR, pages 3029–3037, 2017.
  16. The cityscapes dataset for semantic urban scene understanding. In CVPR, pages 3213–3223, 2016.
  17. Kd-dlgan: Data limited image generation via knowledge distillation. In CVPR, pages 3872–3882, 2023.
  18. A 4.29 nj/pixel stereo depth coprocessor with pixel level pipeline and region optimized semi-global matching for iot application. IEEE TCS, 69(1):334–346, 2021.
  19. A 1920×\times× 1080 129fps 4.3 pj/pixel stereo-matching processor for pico aerial vehicles. In ESSCIRC, pages 345–348, 2023.
  20. Cswin transformer: A general vision transformer backbone with cross-shaped windows. In CVPR, pages 12124–12134, 2022.
  21. The pascal visual object classes (voc) challenge. IJCV, 88:303–338, 2010.
  22. Multiscale vision transformers. In ICCV, pages 6824–6835, 2021.
  23. Yet another survey on image segmentation: Region and boundary information integration. In ECCV, pages 408–422, 2002.
  24. Dual attention network for scene segmentation. In CVPR, pages 3146–3154, 2019.
  25. Knowledge distillation: A survey. IJCV, 129:1789–1819, 2021.
  26. Multi-target knowledge distillation via student self-reflection. IJCV, pages 1–18, 2023.
  27. Cmt: Convolutional neural networks meet vision transformers. In CVPR, pages 12175–12185, 2022.
  28. Online knowledge distillation via collaborative learning. In CVPR, pages 11020–11029, 2020.
  29. A survey on vision transformer. IEEE T-PAMI, 45(1):87–110, 2022.
  30. Data-free ensemble knowledge distillation for privacy-conscious multimedia model compression. In ACM MM, pages 1803–1811, 2021.
  31. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  32. Distilling the knowledge in a neural network. arXiv, 2015.
  33. Ccnet: Criss-cross attention for semantic segmentation. In CVPR, pages 603–612, 2019.
  34. Structural and statistical texture knowledge distillation for semantic segmentation. In CVPR, pages 16876–16885, 2022.
  35. Refine myself by teaching myself: Feature refinement via self-knowledge distillation. In CVPR, pages 10664–10673, 2021.
  36. Spvit: Enabling faster vision transformers via latency-aware soft token pruning. In ECCV, pages 620–640, 2022.
  37. Graph-based knowledge distillation by multi-head attention network. arXiv, 2019.
  38. Next-vit: Next generation vision transformer for efficient deployment in realistic industrial scenarios. arXiv, 2022a.
  39. Exploring feature self-relation for self-supervised transformer. arXiv, 2022b.
  40. Structtoken: Rethinking semantic segmentation with structural prior. IEEE T-CSVT, 2023.
  41. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In CVPR, pages 1925–1934, 2017.
  42. Knowledge distillation via the target-aware transformer. In CVPR, pages 10915–10924, 2022.
  43. Huijun Liu. Lightnet: Light-weight networks for semantic image segmentation. LighCNeC. html, 2018.
  44. Transkd: Transformer knowledge distillation for efficient semantic segmentation. arXiv, 2022.
  45. Structured knowledge distillation for semantic segmentation. In CVPR, pages 2604–2613, 2019.
  46. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pages 10012–10022, 2021.
  47. A simple and generic framework for feature distillation via channel-wise transformation. In ICCV Workshop, pages 1129–1138, 2023.
  48. Fully convolutional networks for semantic segmentation. In CVPR, pages 3431–3440, 2015.
  49. Towards robust vision transformer. In CVPR, pages 12042–12051, 2022.
  50. Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In ECCV, pages 552–568, 2018.
  51. Learning deconvolution network for semantic segmentation. In ICCV, pages 1520–1528, 2015.
  52. Alp-kd: Attention-based layer projection for knowledge distillation. In AAAI, pages 13657–13665, 2021.
  53. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
  54. Conformer: Local features coupling global representations for visual recognition. In CVPR, pages 367–376, 2021.
  55. Distillation-based training for multi-exit architectures. In ICCV, pages 1355–1364, 2019.
  56. Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with transformers. In CVPR, pages 16846–16855, 2022.
  57. Segmenter: Transformer for semantic segmentation. In ICCV, pages 7262–7272, 2021.
  58. High-resolution representations for labeling pixels and regions. arXiv, 2019.
  59. Distilling object detectors with task adaptive regularization. arXiv, 2020.
  60. Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML, pages 6105–6114, 2019.
  61. Offline multi-agent reinforcement learning with knowledge distillation. In NeurIPS, pages 226–237, 2022.
  62. Active boundary loss for semantic segmentation. In AAAI, pages 2397–2405, 2022a.
  63. Crosskd: Cross-head knowledge distillation for dense object detection. arXiv, 2023.
  64. Understanding convolution for semantic segmentation. In WACV, pages 1451–1460, 2018.
  65. Distilling object detectors with fine-grained feature imitation. In CVPR, pages 4933–4942, 2019.
  66. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In ICCV, pages 568–578, 2021a.
  67. Dense contrastive learning for self-supervised visual pre-training. In CVPR, pages 3024–3033, 2021b.
  68. Intra-class feature variation distillation for semantic segmentation. In ECCV, pages 346–362, 2020.
  69. Uformer: A general u-shaped transformer for image restoration. In CVPR, pages 17683–17693, 2022b.
  70. Cvt: Introducing convolutions to vision transformers. In ICCV, pages 22–31, 2021.
  71. Unified perceptual parsing for scene understanding. In ECCV, pages 418–434, 2018.
  72. Segformer: Simple and efficient design for semantic segmentation with transformers. In NeurIPS, pages 12077–12090, 2021.
  73. Nonrigid object contact estimation with regional unwrapping transformer. In ICCV, pages 9342–9351, 2023.
  74. Feature normalized knowledge distillation for image classification. In ECCV, pages 664–680, 2020.
  75. Cross-image relational knowledge distillation for semantic segmentation. In CVPR, pages 12319–12328, 2022a.
  76. Focal and global knowledge distillation for detectors. In CVPR, pages 4643–4652, 2022b.
  77. Effective whole-body pose estimation with two-stages distillation. In ICCV, pages 4210–4220, 2023.
  78. 1% vs 100%: Parameter-efficient low rank adapter for dense predictions. In CVPR, pages 20116–20126, 2023.
  79. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In ECCV, pages 325–341, 2018.
  80. Reinforced multi-teacher selection for knowledge distillation. In AAAI, pages 14284–14291, 2021.
  81. Ocnet: Object context network for scene parsing. arXiv, 2018.
  82. Object-contextual representations for semantic segmentation. In ECCV, pages 173–190, 2020.
  83. Interventional few-shot learning. In NeurIPS, pages 2734–2746, 2020.
  84. Segvit: Semantic segmentation with plain vision transformers. In NeurIPS, pages 4971–4982, 2022a.
  85. Faster segment anything: Towards lightweight sam for mobile applications. arXiv, 2023.
  86. Causal intervention for weakly-supervised semantic segmentation. In NeurIPS, pages 655–666, 2020.
  87. Self-regulation for semantic segmentation. In ICCV, pages 6953–6963, 2021.
  88. Graph reasoning transformer for image parsing. In ACM MM, pages 2380–2389, 2022b.
  89. Context encoding for semantic segmentation. In CVPR, pages 7151–7160, 2018.
  90. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In ICCV, pages 3713–3722, 2019.
  91. Topformer: Token pyramid transformer for mobile semantic segmentation. In CVPR, pages 12083–12093, 2022c.
  92. Decoupled knowledge distillation. In CVPR, pages 11953–11962, 2022.
  93. Pyramid scene parsing network. In CVPR, pages 2881–2890, 2017.
  94. Icnet for real-time semantic segmentation on high-resolution images. In ECCV, pages 405–420, 2018.
  95. Conditional random fields as recurrent neural networks. In ICCV, pages 1529–1537, 2015.
  96. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In CVPR, pages 6881–6890, 2021.
  97. Distilling efficient vision transformers from cnns for semantic segmentation. arXiv, 2023.
  98. Localization distillation for dense object detection. In CVPR, 2022.
  99. Distilling object detectors with feature richness. In NeurIPS, pages 5213–5224, 2021.
  100. Scene parsing through ade20k dataset. In CVPR, pages 633–641, 2017.
  101. Student customized knowledge distillation: Bridging the gap between student and teacher. In ICCV, pages 5057–5066, 2021.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.