Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Robust Features for Few-Shot Object Detection in Satellite Imagery (2403.05381v1)

Published 8 Mar 2024 in cs.CV

Abstract: The goal of this paper is to perform object detection in satellite imagery with only a few examples, thus enabling users to specify any object class with minimal annotation. To this end, we explore recent methods and ideas from open-vocabulary detection for the remote sensing domain. We develop a few-shot object detector based on a traditional two-stage architecture, where the classification block is replaced by a prototype-based classifier. A large-scale pre-trained model is used to build class-reference embeddings or prototypes, which are compared to region proposal contents for label prediction. In addition, we propose to fine-tune prototypes on available training images to boost performance and learn differences between similar classes, such as aircraft types. We perform extensive evaluations on two remote sensing datasets containing challenging and rare objects. Moreover, we study the performance of both visual and image-text features, namely DINOv2 and CLIP, including two CLIP models specifically tailored for remote sensing applications. Results indicate that visual features are largely superior to vision-LLMs, as the latter lack the necessary domain-specific vocabulary. Lastly, the developed detector outperforms fully supervised and few-shot methods evaluated on the SIMD and DIOR datasets, despite minimal training parameters.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. A survey on object detection in optical remote sensing images. ISPRS journal of photogrammetry and remote sensing, 117:11–28, 2016.
  2. Learning rotation-invariant convolutional neural networks for object detection in vhr optical remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 54(12):7405–7415, 2016.
  3. Prototype-cnn for few-shot object detection in remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 60:1–10, 2022.
  4. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  5. Learning to prompt for open-vocabulary object detection with vision-language model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14084–14093, 2022.
  6. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014.
  7. Open-vocabulary object detection via vision and language knowledge distillation. In International Conference on Learning Representations, 2022.
  8. Multisized object detection using spaceborne optical imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13:3032–3046, 2020.
  9. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  10. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
  11. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  12. Few-shot object detection via feature reweighting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8420–8429, 2019.
  13. Multi-modal classifiers for open-vocabulary object detection. In International Conference on Machine Learning, 2023.
  14. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), San Diega, CA, USA, 2015.
  15. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4015–4026, 2023.
  16. Few-shot object detection: a comprehensive survey. IEEE Transactions on Neural Networks and Learning Systems, 2023.
  17. Rotation-insensitive and context-augmented object detection in remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 56(4):2337–2348, 2017.
  18. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS Journal of Photogrammetry and Remote Sensing, 159:296–307, 2020.
  19. Few-shot object detection on remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 60:1–14, 2022.
  20. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
  21. Remoteclip: A vision language foundation model for remote sensing. CoRR, abs/2306.11029, 2023.
  22. Few-shot object detection in aerial imagery guided by text-modal knowledge. IEEE Transactions on Geoscience and Remote Sensing, 61:1–19, 2023.
  23. Simple open-vocabulary object detection. In European Conference on Computer Vision, pages 728–755. Springer, 2022.
  24. Scaling open-vocabulary object detection. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  25. DINOv2: Learning robust visual features without supervision. Transactions on Machine Learning Research, 2024.
  26. Learning to name classes for vision and language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23477–23486, 2023.
  27. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  28. Vehicle detection in aerial imagery: A small target detection benchmark. Journal of Visual Communication and Image Representation, 34:187–203, 2016.
  29. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
  30. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  31. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017.
  32. Double head predictor based few-shot object detection for aerial imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 721–731, 2021.
  33. Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3974–3983, 2018.
  34. A review of object detection based on deep learning. Multimedia Tools and Applications, 79:23729–23791, 2020.
  35. Open-vocabulary detr with conditional matching. In European Conference on Computer Vision, pages 106–122. Springer, 2022.
  36. Open-vocabulary object detection using captions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14393–14402, 2021.
  37. Generalized few-shot object detection in remote sensing images. ISPRS Journal of Photogrammetry and Remote Sensing, 195:353–364, 2023a.
  38. Detect every thing with few examples. arXiv preprint arXiv:2309.12969, 2023b.
  39. Rs5m: A large scale vision-language dataset for remote sensing vision-language foundation model. arXiv preprint arXiv:2306.11300, 2023c.
  40. Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems, 30(11):3212–3232, 2019.
  41. Regionclip: Region-based language-image pretraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16793–16803, 2022.
  42. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.
  43. Convolutional neural network based automatic object detection on aerial images. IEEE Geoscience and Remote Sensing Letters, 13:1–5, 2016.
Citations (1)

Summary

We haven't generated a summary for this paper yet.