Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Semi-supervised Open-World Object Detection (2402.16013v1)

Published 25 Feb 2024 in cs.CV

Abstract: Conventional open-world object detection (OWOD) problem setting first distinguishes known and unknown classes and then later incrementally learns the unknown objects when introduced with labels in the subsequent tasks. However, the current OWOD formulation heavily relies on the external human oracle for knowledge input during the incremental learning stages. Such reliance on run-time makes this formulation less realistic in a real-world deployment. To address this, we introduce a more realistic formulation, named semi-supervised open-world detection (SS-OWOD), that reduces the annotation cost by casting the incremental learning stages of OWOD in a semi-supervised manner. We demonstrate that the performance of the state-of-the-art OWOD detector dramatically deteriorates in the proposed SS-OWOD setting. Therefore, we introduce a novel SS-OWOD detector, named SS-OWFormer, that utilizes a feature-alignment scheme to better align the object query representations between the original and augmented images to leverage the large unlabeled and few labeled data. We further introduce a pseudo-labeling scheme for unknown detection that exploits the inherent capability of decoder object queries to capture object-specific information. We demonstrate the effectiveness of our SS-OWOD problem setting and approach for remote sensing object detection, proposing carefully curated splits and baseline performance evaluations. Our experiments on 4 datasets including MS COCO, PASCAL, Objects365 and DOTA demonstrate the effectiveness of our approach. Our source code, models and splits are available here - https://github.com/sahalshajim/SS-OWFormer

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Transformers in remote sensing: A survey. arXiv preprint arXiv:2209.01206, 2022.
  2. Zero-shot object detection. In Proceedings of the European Conference on Computer Vision (ECCV), pages 384–400, 2018.
  3. End-to-end object detection with transformers, 2020.
  4. Anchor-free oriented proposal generator for object detection. IEEE Transactions on Geoscience and Remote Sensing, 60:1–11, 2022.
  5. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303–338, 2010.
  6. Self-supervised models are continual learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  7. Note-rcnn: Noise tolerant ensemble rcnn for semi-supervised object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9508–9517, 2019.
  8. Rich feature hierarchies for accurate object detection and semantic segmentation, 2014.
  9. Swin-transformer-enabled yolov5 with attention mechanism for small object detection on satellite images. Remote Sensing, 14(12):2861, 2022.
  10. Ow-detr: Open-world detection transformer. In CVPR, 2022.
  11. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  12. Towards open world object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021.
  13. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  14. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2019.
  15. Learning to self-train for semi-supervised few-shot classification. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2019.
  16. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
  17. Visual relationship detection with language priors. In European conference on computer vision, pages 852–869. Springer, 2016.
  18. Incremental few-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13846–13855, 2020.
  19. A dual-channel semi-supervised learning framework on graphs via knowledge transfer and meta-learning. ACM Trans. Web, 2023. Just Accepted.
  20. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  21. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017.
  22. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  23. Incremental learning of ncm forests for large-scale image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
  24. Coco-funit: Few-shot unsupervised image translation with a content conditioned style encoder. In Computer Vision – ECCV 2020, pages 382–398, Cham, 2020. Springer International Publishing.
  25. Objects365: A large-scale, high-quality dataset for object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8430–8439, 2019.
  26. Variational adversarial active learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  27. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In Advances in Neural Information Processing Systems, pages 596–608. Curran Associates, Inc., 2020.
  28. Adam Van Etten. You only look twice: Rapid multi-scale object detection in satellite imagery. arXiv preprint arXiv:1805.09512, 2018.
  29. Advancing plain vision transformer towards remote sensing foundation model. arXiv preprint arXiv:2208.03987, 2022.
  30. isaid: A large-scale dataset for instance segmentation in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 28–37, 2019.
  31. Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3974–3983, 2018.
  32. Detecting 11k classes: Large scale object detection without fine-grained bounding boxes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9805–9813, 2019.
  33. Learning loss for active learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  34. Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pages 12310–12320. PMLR, 2021.
  35. S4l: Self-supervised semi-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  36. Dino: Detr with improved denoising anchor boxes for end-to-end object detection, 2022.
  37. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020.
  38. Detrs with collaborative hybrid assignments training, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sahal Shaji Mullappilly (9 papers)
  2. Abhishek Singh Gehlot (3 papers)
  3. Rao Muhammad Anwer (67 papers)
  4. Fahad Shahbaz Khan (225 papers)
  5. Hisham Cholakkal (78 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.