Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Patch-Level Contrasting without Patch Correspondence for Accurate and Dense Contrastive Representation Learning (2306.13337v1)

Published 23 Jun 2023 in cs.CV

Abstract: We propose ADCLR: A ccurate and D ense Contrastive Representation Learning, a novel self-supervised learning framework for learning accurate and dense vision representation. To extract spatial-sensitive information, ADCLR introduces query patches for contrasting in addition with global contrasting. Compared with previous dense contrasting methods, ADCLR mainly enjoys three merits: i) achieving both global-discriminative and spatial-sensitive representation, ii) model-efficient (no extra parameters in addition to the global contrasting baseline), and iii) correspondence-free and thus simpler to implement. Our approach achieves new state-of-the-art performance for contrastive methods. On classification tasks, for ViT-S, ADCLR achieves 77.5% top-1 accuracy on ImageNet with linear probing, outperforming our baseline (DINO) without our devised techniques as plug-in, by 0.5%. For ViT-B, ADCLR achieves 79.8%, 84.0% accuracy on ImageNet by linear probing and finetune, outperforming iBOT by 0.3%, 0.2% accuracy. For dense tasks, on MS-COCO, ADCLR achieves significant improvements of 44.3% AP on object detection, 39.7% AP on instance segmentation, outperforming previous SOTA method SelfPatch by 2.2% and 1.2%, respectively. On ADE20K, ADCLR outperforms SelfPatch by 1.0% mIoU, 1.2% mAcc on the segme

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  2. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254, 2021.
  3. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. In ICLR, 2022.
  4. Unsupervised learning of visual features by contrasting cluster assignments. NeurIPS, 2020.
  5. Emerging properties in self-supervised vision transformers. In ICCV, 2021.
  6. A simple framework for contrastive learning of visual representations. In ICML, 2020a.
  7. Context autoencoder for self-supervised representation learning. arXiv preprint arXiv:2202.03026, 2022.
  8. Exploring simple siamese representation learning. In CVPR, 2021.
  9. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020b.
  10. An empirical study of training self-supervised vision transformers. In ICCV, 2021.
  11. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  12. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2020.
  13. Corrupted image modeling for self-supervised visual pre-training. arXiv preprint arXiv:2202.03382, 2022.
  14. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728, 2018.
  15. Generative adversarial nets. NeurIPS, 27, 2014.
  16. Bootstrap your own latent-a new approach to self-supervised learning. NeurIPS, 2020.
  17. Deep residual learning for image recognition. In CVPR, 2016.
  18. Mask r-cnn. In ICCV, 2017.
  19. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
  20. Masked autoencoders are scalable vision learners. arXiv preprint arXiv:2111.06377, 2021.
  21. Object discovery and representation networks. ECCV, 2022.
  22. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670, 2018.
  23. Acquisition of localization confidence for accurate object detection. In ECCV, 2018.
  24. Expectation-maximization contrastive learning for compact video-and-language representations. In NeurIPS, 2022.
  25. Microsoft coco: Common objects in context. In ECCV, 2014.
  26. Feature pyramid networks for object detection. In CVPR, 2017.
  27. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pp.  10012–10022, 2021.
  28. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  29. Fixing weight decay regularization in adam. 2018.
  30. Dall-e: Creating images from text. UGC Care Group I Journal, 2021.
  31. The inaturalist species classification and detection dataset. In CVPR, 2018.
  32. Attention is all you need. NeurIPS, 2017.
  33. Extracting and composing robust features with denoising autoencoders. In ICML, 2008.
  34. Dense contrastive learning for self-supervised visual pre-training. In CVPR, 2021.
  35. Exploring set similarity for dense self-supervised representation learning. In CVPR, 2022.
  36. Aligning pretraining for detection via object-level contrastive learning. NeurIPS, 2021.
  37. Unified perceptual parsing for scene understanding. In ECCV, 2018.
  38. Region similarity representation learning. In ICCV, 2021.
  39. Detco: Unsupervised contrastive learning for object detection. In ICCV, 2021a.
  40. Self-supervised learning with swin transformers. arXiv preprint arXiv:2105.04553, 2021b.
  41. Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning. In CVPR, 2021c.
  42. Simmim: A simple framework for masked image modeling. arXiv preprint arXiv:2111.09886, 2021d.
  43. Masked image modeling with denoising contrast. arXiv preprint arXiv:2205.09616, 2022.
  44. Patch-level representation learning for self-supervised vision transformers. In CVPR, 2022.
  45. Barlow twins: Self-supervised learning via redundancy reduction. In ICML, 2021.
  46. Colorful image colorization. In ECCV, 2016.
  47. Zero-cl: Instance and feature decorrelation for negative-free symmetric contrastive learning. In ICLR, 2021.
  48. Align representations with base: A new approach to self-supervised learning. In CVPR, 2022.
  49. Scene parsing through ade20k dataset. In CVPR, 2017.
  50. Image BERT pre-training with online tokenizer. In ICLR, 2022.
Citations (15)

Summary

We haven't generated a summary for this paper yet.