Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational Graph (2405.08547v2)
Abstract: In visual tasks, large teacher models capture essential features and deep information, enhancing performance. However, distilling this information into smaller student models often leads to performance loss due to structural differences and capacity limitations. To tackle this, we propose a distillation framework based on graph knowledge, including a multi-level feature alignment strategy and an attention-guided mechanism to provide a targeted learning trajectory for the student model. We emphasize spectral embedding (SE) as a key technique in our distillation process, which merges the student's feature space with the relational knowledge and structural complexities similar to the teacher network. This method captures the teacher's understanding in a graph-based representation, enabling the student model to more accurately mimic the complex structural dependencies present in the teacher model. Compared to methods that focus only on specific distillation areas, our strategy not only considers key features within the teacher model but also endeavors to capture the relationships and interactions among feature sets, encoding these complex pieces of information into a graph structure to understand and utilize the dynamic relationships among these pieces of information from a global perspective. Experiments show that our method outperforms previous feature distillation methods on the CIFAR-100, MS-COCO, and Pascal VOC datasets, proving its efficiency and applicability.
- Fitnets: Hints for thin deep nets. Proc. ICLR, 2(3):1, 2015.
- Knowledge distillation with the reused teacher classifier. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11933–11942, 2022.
- Learning efficient object detection models with knowledge distillation. Advances in neural information processing systems, 30, 2017.
- Distilling knowledge via knowledge review. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5008–5017, 2021.
- Deep structured instance graph for distilling object detectors. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4359–4368, 2021.
- General instance distillation for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7842–7851, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- The pascal visual object classes (voc) challenge. International journal of computer vision, 88:303–338, 2010.
- RCNN Faster. Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 9199(10.5555):2969239–2969250, 2015.
- Ross Girshick. Fast r-cnn in proceedings of the ieee international conference on computer vision (pp. 1440–1448). Piscataway, NJ: IEEE.[Google Scholar], 2, 2015.
- Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014.
- Knowledge distillation: A survey. International Journal of Computer Vision, 129(6):1789–1819, 2021.
- Distilling object detectors via decoupled features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2154–2164, 2021.
- Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
- Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 3779–3787, 2019.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Masked distillation with receptive tokens. arXiv preprint arXiv:2205.14589, 2022.
- Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351, 2019.
- Multi-level logit distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24276–24285, 2023.
- Feature fusion for online mutual knowledge distillation. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 4619–4625. IEEE, 2021.
- Learning multiple layers of features from tiny images. 2009.
- Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
- Mimicking very efficient network for object detection. In Proceedings of the ieee conference on computer vision and pattern recognition, pages 6356–6364, 2017.
- Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Advances in Neural Information Processing Systems, 33:21002–21012, 2020.
- Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
- Exploring inter-channel correlation for diversity-preserved knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8271–8280, 2021.
- A survey and performance evaluation of deep learning methods for small object detection. Expert Systems with Applications, 172:114602, 2021.
- Structured knowledge distillation for semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2604–2613, 2019.
- Knowledge distillation via instance relationship graph. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7096–7104, 2019.
- Spectral embedding of graphs. Pattern recognition, 36(10):2213–2230, 2003.
- On the benefits of knowledge distillation for adversarial robustness. arXiv preprint arXiv:2203.07159, 2022.
- Triplet loss for knowledge distillation. In 2020 International Joint Conference on Neural Networks (IJCNN), pages 1–7. IEEE, 2020.
- Libra r-cnn: Towards balanced learning for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 821–830, 2019.
- Efficient medical image segmentation based on knowledge distillation. IEEE Transactions on Medical Imaging, 40(12):3820–3831, 2021.
- How and when adversarial robustness transfers in knowledge distillation? arXiv preprint arXiv:2110.12072, 2021.
- Distilling object detectors with task adaptive regularization. arXiv preprint arXiv:2006.13108, 2020.
- Contrastive representation distillation. arXiv preprint arXiv:1910.10699, 2019.
- Fully convolutional one-stage 3d object detection on lidar range images. Advances in Neural Information Processing Systems, 35:34899–34911, 2022.
- Similarity-preserving knowledge distillation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1365–1374, 2019.
- Head: Hetero-assists distillation for heterogeneous object detectors. In European Conference on Computer Vision, pages 314–331. Springer, 2022.
- Distilling object detectors with fine-grained feature imitation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4933–4942, 2019.
- Preparing lessons: Improve knowledge distillation with better supervision. Neurocomputing, 454:25–33, 2021.
- Regional filtering distillation for object detection. Machine Vision and Applications, 35(2):24, 2024.
- Context matters: Distilling knowledge graph for enhanced object detection. IEEE Transactions on Multimedia, 2023.
- Cross-image relational knowledge distillation for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12319–12328, 2022.
- Focal and global knowledge distillation for detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4643–4652, 2022.
- A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4133–4141, 2017.
- Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928, 2016.
- Structured knowledge distillation for accurate and efficient object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Localization distillation for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Distilling object detectors with feature richness. Advances in Neural Information Processing Systems, 34:5213–5224, 2021.
- Distilling holistic knowledge with graph neural networks. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10387–10396, 2021.
- Revisiting adversarial robustness distillation: Robust soft labels make student better. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16443–16452, 2021.