Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 40 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 101 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 161 tok/s Pro
2000 character limit reached

Zero-Shot Aerial Object Detection with Visual Description Regularization (2402.18233v2)

Published 28 Feb 2024 in cs.CV

Abstract: Existing object detection models are mainly trained on large-scale labeled datasets. However, annotating data for novel aerial object classes is expensive since it is time-consuming and may require expert knowledge. Thus, it is desirable to study label-efficient object detection methods on aerial images. In this work, we propose a zero-shot method for aerial object detection named visual Description Regularization, or DescReg. Concretely, we identify the weak semantic-visual correlation of the aerial objects and aim to address the challenge with prior descriptions of their visual appearance. Instead of directly encoding the descriptions into class embedding space which suffers from the representation gap problem, we propose to infuse the prior inter-class visual similarity conveyed in the descriptions into the embedding learning. The infusion process is accomplished with a newly designed similarity-aware triplet loss which incorporates structured regularization on the representation space. We conduct extensive experiments with three challenging aerial object detection datasets, including DIOR, xView, and DOTA. The results demonstrate that DescReg significantly outperforms the state-of-the-art ZSD methods with complex projection designs and generative frameworks, e.g., DescReg outperforms best reported ZSD method on DIOR by 4.5 mAP on unseen classes and 8.1 in HM. We further show the generalizability of DescReg by integrating it into generative ZSD methods as well as varying the detection architecture.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Label-embedding for image classification. IEEE transactions on pattern analysis and machine intelligence, 38(7): 1425–1438.
  2. Zero-Shot Object Detection. In European Conference on Computer Vision.
  3. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 6154–6162.
  4. Learning Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection. IEEE Transactions on Image Processing, 28: 265–278.
  5. Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Transactions on Geoscience and Remote Sensing, 54: 7405–7415.
  6. Zero-Shot Object Detection by Hybrid Region Embedding. In British Machine Vision Conference.
  7. Learning Visually Consistent Label Embeddings for Zero-Shot Learning. 2019 IEEE International Conference on Image Processing (ICIP), 3656–3660.
  8. A global-local self-adaptive network for drone-view object detection. IEEE Transactions on Image Processing, 30: 1556–1569.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  10. Object detection in aerial images: A large-scale benchmark and challenges. IEEE transactions on pattern analysis and machine intelligence, 44(11): 7778–7796.
  11. Write a classifier: Zero-shot learning using purely textual descriptions. In Proceedings of the IEEE International Conference on Computer Vision, 2584–2591.
  12. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2): 303–338.
  13. Devise: A deep visual-semantic embedding model. Advances in neural information processing systems, 26.
  14. Align deep features for oriented object detection. IEEE Transactions on Geoscience and Remote Sensing, 60: 1–11.
  15. Contrastive Embedding for Generalized Zero-Shot Learning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2371–2381.
  16. Synthesizing the Unseen for Zero-shot Object Detection. In Asian Conference on Computer Vision.
  17. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
  18. Robust Region Feature Synthesizer for Zero-Shot Object Detection. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7612–7621.
  19. Skeleton based Zero Shot Action Recognition in Joint Pose-Language Semantic Space. ArXiv, abs/1911.11344.
  20. YOLO by Ultralytics.
  21. Few-shot object detection via feature reweighting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 8420–8429.
  22. Frustratingly Simple but Effective Zero-shot Detection and Segmentation: Analysis and a Strong Baseline. arXiv preprint arXiv:2302.07319.
  23. Focus-and-Detect: A small object detection framework for aerial images. Signal Processing: Image Communication, 104: 116675.
  24. xView: Objects in Context in Overhead Imagery. ArXiv, abs/1802.07856.
  25. Density map guided object detection in aerial images. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 190–191.
  26. Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark. ArXiv, abs/1909.00133.
  27. Oriented reppoints for aerial object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1829–1838.
  28. Zero-Shot Object Detection with Textual Descriptions. In AAAI Conference on Artificial Intelligence.
  29. Microsoft coco: Common objects in context. In European conference on computer vision, 740–755. Springer.
  30. Few-shot object detection in aerial imagery guided by text-modal knowledge. IEEE Transactions on Geoscience and Remote Sensing, 61: 1–19.
  31. Cascaded Zoom-in Detector for High Resolution Aerial Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2045–2054.
  32. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.
  33. A Generative Approach to Zero-Shot and Few-Shot Action Recognition. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 372–380.
  34. OpenAI. 2023. GPT-4. https://chat.openai.com/. Accessed: 2024-01-24.
  35. Zest: Zero-shot learning from text descriptions using textual similarity and visual summarization. arXiv preprint arXiv:2010.03276.
  36. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763. PMLR.
  37. Improved visual-semantic alignment for zero-shot object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 11932–11939.
  38. Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts. In Asian Conference on Computer Vision, 547–563. Springer.
  39. Learning to zoom: a saliency-based sampling layer for neural networks. In Proceedings of the European conference on computer vision (ECCV), 51–66.
  40. YOLO9000: better, faster, stronger. arXiv preprint.
  41. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, 91–99.
  42. Resolving Semantic Confusions for Improved Zero-Shot Detection. arXiv preprint arXiv:2212.06097.
  43. Zero-shot learning via structure-aligned generative adversarial network. IEEE transactions on neural networks and learning systems, 33(11): 6749–6762.
  44. An angle-based method for measuring the semantic similarity between visual and textual features. Soft Computing, 23: 4041–4050.
  45. Zero-shot learning by mutual information estimation and maximization. Knowledge-Based Systems, 194: 105490.
  46. Zero-shot visual recognition via bidirectional latent embedding. International Journal of Computer Vision, 124: 356–383.
  47. Wang, T. 2023. Learning to detect and segment for open vocabulary object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7051–7060.
  48. Double head predictor based few-shot object detection for aerial imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 721–731.
  49. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3974–3983.
  50. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3974–3983.
  51. Semantics-guided contrastive network for zero-shot object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  52. QueryDet: Cascaded sparse query for accelerating high-resolution small object detection. In IEEE Conference on computer vision and pattern recognition, 13668–13677.
  53. Clustered object detection in aerial images. In IEEE International conference on computer vision, 8311–8320.
  54. Zero-shot object detection via learning an embedding from semantic space to visual space. In Twenty-Ninth International Joint Conference on Artificial Intelligence.
  55. Rotation-Invariant Feature Learning for Object Detection in VHR Optical Remote Sensing Images by Double-Net. IEEE Access, 8: 20818–20827.
  56. GTNet: Generative Transfer Network for Zero-Shot Object Detection. ArXiv, abs/2001.06812.
  57. Background Learnable Cascade for Zero-Shot Detection.
  58. Don’t Even Look Once: Synthesizing Features for Zero-Shot Detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11690–11699.
  59. Detection and tracking meet drones challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11): 7380–7399.
Citations (2)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.