Unsupervised learning based object detection using Contrastive Learning (2402.13465v1)
Abstract: Training image-based object detectors presents formidable challenges, as it entails not only the complexities of object detection but also the added intricacies of precisely localizing objects within potentially diverse and noisy environments. However, the collection of imagery itself can often be straightforward; for instance, cameras mounted in vehicles can effortlessly capture vast amounts of data in various real-world scenarios. In light of this, we introduce a groundbreaking method for training single-stage object detectors through unsupervised/self-supervised learning. Our state-of-the-art approach has the potential to revolutionize the labeling process, substantially reducing the time and cost associated with manual annotation. Furthermore, it paves the way for previously unattainable research opportunities, particularly for large, diverse, and challenging datasets lacking extensive labels. In contrast to prevalent unsupervised learning methods that primarily target classification tasks, our approach takes on the unique challenge of object detection. We pioneer the concept of intra-image contrastive learning alongside inter-image counterparts, enabling the acquisition of crucial location information essential for object detection. The method adeptly learns and represents this location information, yielding informative heatmaps. Our results showcase an outstanding accuracy of \textbf{89.2\%}, marking a significant breakthrough of approximately \textbf{15x} over random initialization in the realm of unsupervised object detection within the field of computer vision.
- Learning to see by moving. In 2015 IEEE International Conference on Computer Vision (ICCV), pp. 37–45, 2015. doi: 10.1109/ICCV.2015.13.
- Unsupervised learning of visual features by contrasting cluster assignments. ArXiv, abs/2006.09882, 2020.
- A simple framework for contrastive learning of visual representations, 2020a. URL https://arxiv.org/abs/2002.05709.
- Improved baselines with momentum contrastive learning. ArXiv, abs/2003.04297, 2020b.
- Unsupervised visual representation learning by context prediction, 2015. URL https://arxiv.org/abs/1505.05192.
- Bootstrap your own latent: A new approach to self-supervised learning. ArXiv, abs/2006.07733, 2020.
- Deep residual learning for image recognition, 2015. URL https://arxiv.org/abs/1512.03385.
- Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9726–9735, 2020. doi: 10.1109/CVPR42600.2020.00975.
- Data-efficient image recognition with contrastive predictive coding. ArXiv, abs/1905.09272, 2019.
- CLAWS: Contrastive learning with hard attention and weak supervision. In AI for Agriculture and Food Systems, 2021. URL https://openreview.net/forum?id=oIiN8hqDrJb.
- Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
- Discerning self-supervised learning and weakly supervised learning, 2023. URL https://openreview.net/forum?id=H9BGkFz-Sm.
- Contrastive clustering. ArXiv, abs/2009.09687, 2020.
- Microsoft coco: Common objects in context. In European Conference on Computer Vision, 2014.
- Feature pyramid networks for object detection, 2016. URL https://arxiv.org/abs/1612.03144.
- Focal loss for dense object detection, 2017. URL https://arxiv.org/abs/1708.02002.
- Ssd: Single shot multibox detector. In European Conference on Computer Vision, 2015.
- Weakly-supervised contrastive learning for unsupervised object discovery. ArXiv, abs/2307.03376, 2023. URL https://api.semanticscholar.org/CorpusID:259375748.
- Unsupervised learning of visual representations by solving jigsaw puzzles, 2016. URL https://arxiv.org/abs/1603.09246.
- Representation learning with contrastive predictive coding, 2018. URL https://arxiv.org/abs/1807.03748.
- Amodal instance segmentation with kins dataset. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3009–3018, 2019. doi: 10.1109/CVPR.2019.00313.
- Learning to compose domain-specific transformations for data augmentation. Advances in neural information processing systems, 30:3239–3249, 2017.
- Understanding contrastive learning requires incorporating inductive biases. In International Conference on Machine Learning, 2022.
- A survey on image data augmentation for deep learning. Journal of Big Data, 6:1–48, 2019.
- Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
- Revisiting unreasonable effectiveness of data in deep learning era, 2017. URL https://arxiv.org/abs/1707.02968.
- What makes for good views for contrastive learning. ArXiv, abs/2005.10243, 2020.
- Large-scale unsupervised object discovery. In Neural Information Processing Systems, 2021. URL https://api.semanticscholar.org/CorpusID:235422662.
- Unsupervised learning of visual representations using videos, 2015. URL https://arxiv.org/abs/1505.00687.
- Dense contrastive learning for self-supervised visual pre-training. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2021.
- Cut and learn for unsupervised object detection and instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3124–3134, 2023.
- Tokencut: Segmenting objects in images and videos with self-supervised transformer and normalized cut. IEEE transactions on pattern analysis and machine intelligence, PP, 2022a. URL https://api.semanticscholar.org/CorpusID:251979706.
- 4d unsupervised object discovery, 2022b. URL https://arxiv.org/abs/2210.04801.
- Unsupervised feature learning via non-parametric instance-level discrimination. ArXiv, abs/1805.01978, 2018.
- Detco: Unsupervised contrastive learning for object detection. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8372–8381, 2021.