Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
43 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
37 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
4 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Pixel-Wise Contrastive Distillation (2211.00218v3)

Published 1 Nov 2022 in cs.CV

Abstract: We present a simple but effective pixel-level self-supervised distillation framework friendly to dense prediction tasks. Our method, called Pixel-Wise Contrastive Distillation (PCD), distills knowledge by attracting the corresponding pixels from student's and teacher's output feature maps. PCD includes a novel design called SpatialAdaptor which ``reshapes'' a part of the teacher network while preserving the distribution of its output features. Our ablation experiments suggest that this reshaping behavior enables more informative pixel-to-pixel distillation. Moreover, we utilize a plug-in multi-head self-attention module that explicitly relates the pixels of student's feature maps to enhance the effective receptive field, leading to a more competitive student. PCD \textbf{outperforms} previous self-supervised distillation methods on various dense prediction tasks. A backbone of \mbox{ResNet-18-FPN} distilled by PCD achieves $37.4$ AP$\text{bbox}$ and $34.0$ AP$\text{mask}$ on COCO dataset using the detector of \mbox{Mask R-CNN}. We hope our study will inspire future research on how to pre-train a small model friendly to dense prediction tasks in a self-supervised fashion.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Compress: Self-supervised learning by compressing representations. Advances in Neural Information Processing Systems, 33:12980–12992, 2020.
  2. Do deep nets really need to be deep? Advances in neural information processing systems, 27, 2014.
  3. Point-level region contrast for object detection pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16061–16070, 2022.
  4. Distill on the go: Online knowledge distillation in self-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2678–2687, 2021.
  5. Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 535–541, 2006.
  6. Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems, 33:9912–9924, 2020.
  7. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
  8. Distilling knowledge via knowledge review. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5008–5017, 2021.
  9. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  10. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020.
  11. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15750–15758, 2021.
  12. An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9640–9649, 2021.
  13. Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3435–3444, 2019.
  14. Unsupervised representation transfer for small networks: I believe i can distill on-the-fly. Advances in Neural Information Processing Systems, 34:24645–24658, 2021.
  15. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
  16. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  17. The pascal visual object classes challenge: A retrospective. International journal of computer vision, 111(1):98–136, 2015.
  18. Seed: Self-supervised distillation for visual representation. arXiv preprint arXiv:2101.04731, 2021.
  19. Disco: Remedy self-supervised learning on lightweight models with distilled contrastive learning. arXiv preprint arXiv:2104.09124, 2021.
  20. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
  21. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
  22. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 2, pages 1735–1742. IEEE, 2006.
  23. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
  24. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  25. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
  26. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  27. Efficient visual pretraining with contrastive detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10086–10096, 2021.
  28. A comprehensive overhaul of feature distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1921–1930, 2019.
  29. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2(7), 2015.
  30. Frank L Hitchcock. The distribution of a product from several sources to numerous localities. Journal of mathematics and physics, 20(1-4):224–230, 1941.
  31. Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1314–1324, 2019.
  32. Revisiting the critical factors of augmentation-invariant representation learning. In European Conference on Computer Vision, pages 42–58. Springer, 2022.
  33. Paraphrasing complex network: Network compression via factor transfer. Advances in neural information processing systems, 31, 2018.
  34. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
  35. Self-emd: Self-supervised object detection without imagenet. arXiv preprint arXiv:2011.13677, 2020.
  36. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
  37. Understanding the effective receptive field in deep convolutional neural networks. Advances in neural information processing systems, 29, 2016.
  38. Ishan Misra and Laurens van der Maaten. Self-supervised learning of pretext-invariant representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6707–6717, 2020.
  39. Boosting self-supervised learning via knowledge transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9359–9367, 2018.
  40. Unsupervised learning of dense visual representations. Advances in Neural Information Processing Systems, 33:4489–4500, 2020.
  41. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
  42. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  43. Spatially consistent representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1144–1153, 2021.
  44. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.
  45. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
  46. Contrastive multiview coding. In European conference on computer vision, pages 776–794. Springer, 2020.
  47. Selective search for object recognition. International journal of computer vision, 104(2):154–171, 2013.
  48. Access to unlabeled data can speed up prediction time. In ICML, 2011.
  49. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  50. Dense contrastive learning for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3024–3033, 2021.
  51. Aligning pretraining for detection via object-level contrastive learning. Advances in Neural Information Processing Systems, 34:22682–22694, 2021.
  52. Self-supervised visual representation learning with semantic grouping. arXiv preprint arXiv:2205.15288, 2022.
  53. Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22–31, 2021.
  54. Tinyvit: Fast pretraining distillation for small vision transformers. arXiv preprint arXiv:2207.10666, 2022.
  55. Detectron2. https://github.com/facebookresearch/detectron2, 2019.
  56. Region similarity representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10539–10548, 2021.
  57. Unsupervised object-level representation learning from scene images. Advances in Neural Information Processing Systems, 34:28864–28876, 2021.
  58. Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16684–16693, 2021.
  59. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9653–9663, 2022.
  60. Bag of instances aggregation boosts self-supervised distillation. In International Conference on Learning Representations, 2021.
  61. Clusterfit: Improving generalization of visual representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6509–6518, 2020.
  62. Instance localization for self-supervised detection pretraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3987–3996, 2021.
  63. Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888, 2017.
  64. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928, 2016.
  65. Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pages 12310–12320. PMLR, 2021.
  66. Boosting contrastive learning with relation knowledge distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 3508–3516, 2022.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.