AirShot: Efficient Few-Shot Detection for Autonomous Exploration (2404.05069v1)
Abstract: Few-shot object detection has drawn increasing attention in the field of robotic exploration, where robots are required to find unseen objects with a few online provided examples. Despite recent efforts have been made to yield online processing capabilities, slow inference speeds of low-powered robots fail to meet the demands of real-time detection-making them impractical for autonomous exploration. Existing methods still face performance and efficiency challenges, mainly due to unreliable features and exhaustive class loops. In this work, we propose a new paradigm AirShot, and discover that, by fully exploiting the valuable correlation map, AirShot can result in a more robust and faster few-shot object detection system, which is more applicable to robotics community. The core module Top Prediction Filter (TPF) can operate on multi-scale correlation maps in both the training and inference stages. During training, TPF supervises the generation of a more representative correlation map, while during inference, it reduces looping iterations by selecting top-ranked classes, thus cutting down on computational costs with better performance. Surprisingly, this dual functionality exhibits general effectiveness and efficiency on various off-the-shelf models. Exhaustive experiments on COCO2017, VOC2014, and SubT datasets demonstrate that TPF can significantly boost the efficacy and efficiency of most off-the-shelf models, achieving up to 36.4% precision improvements along with 56.3% faster inference speed. Code and Data are at: https://github.com/ImNotPrepared/AirShot.
- Q. Fan, W. Zhuo, C.-K. Tang, and Y.-W. Tai, “Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 4013–4022.
- H. Hu, S. Bai, A. Li, J. Cui, and L. Wang, “Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 10 185–10 194.
- B. Kang, Z. Liu, X. Wang, F. Yu, J. Feng, and T. Darrell, “Few-Shot Object Detection via Feature Reweighting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 8420–8429.
- Y.-X. Wang, D. Ramanan, and M. Hebert, “Meta-Learning to Detect Rare Objects,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9925–9934.
- X. Wang, T. E. Huang, T. Darrell, J. E. Gonzalez, and F. Yu, “Frustratingly simple few-shot object detection,” arXiv preprint arXiv:2003.06957, 2020.
- C. Wang, Y. Qiu, W. Wang, Y. Hu, S. Kim, and S. Scherer, “Unsupervised Online Learning for Robotic Interestingness with Visual Memory,” IEEE Transactions on Robotics, pp. 1–15, 2021.
- C. Wang, W. Wang, Y. Qiu, Y. Hu, and S. Scherer, “Visual Memorability for Robotic Interestingness via Unsupervised Online Learning,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020, pp. 52–68.
- B. Li, C. Wang, P. Reddy, S. Kim, and S. Scherer, “Airdet: Few-shot detection without fine-tuning for autonomous exploration,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIX. Springer, 2022, pp. 427–444.
- S. Kim, C. Wang, B. Li, and S. Scherer, “Robotic Interestingness via Human-Informed Few-Shot Object Detection,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 1756–1763.
- Y. Li, H. Zhu, Y. Cheng, W. Wang, C. S. Teo, C. Xiang, P. Vadakkepat, and T. H. Lee, “Few-shot object detection via classification refinement and distractor retreatment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15 395–15 403.
- Z. Yang, C. Zhang, R. Li, Y. Xu, and G. Lin, “Efficient few-shot object detection via knowledge inheritance,” IEEE Transactions on Image Processing, vol. 32, pp. 321–334, 2022.
- Z. Fan, Y. Ma, Z. Li, and J. Sun, “Generalized few-shot object detection without forgetting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4527–4536.
- L. Qiao, Y. Zhao, Z. Li, X. Qiu, J. Wu, and C. Zhang, “Defrcn: Decoupled faster r-cnn for few-shot object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 8681–8690.
- B. Sun, B. Li, S. Cai, Y. Yuan, and C. Zhang, “FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 7352–7362.
- J. Wu, S. Liu, D. Huang, and Y. Wang, “Multi-Scale Positive Sample Refinement for Few-Shot Object Detection,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020, pp. 456–472.
- L. Zhang, S. Zhou, J. Guan, and J. Zhang, “Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 14 424–14 432.
- W. Zhang, Y.-X. Wang, and D. A. Forsyth, “Cooperating rpn’s improve few-shot object detection,” arXiv preprint arXiv:2011.10142, 2020.
- R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol. 2. IEEE, 2006, pp. 1735–1742.
- R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.
- R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587.
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single Shot Multibox Detector,” in Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 21–37.
- J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263–7271.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
- J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
- T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2117–2125.
- X. Yan, Z. Chen, A. Xu, X. Wang, X. Liang, and L. Lin, “Meta R-CNN: Towards General Solver for Instance-Level Low-Shot Learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9577–9586.
- G. Han, Y. He, S. Huang, J. Ma, and S.-F. Chang, “Query adaptive few-shot object detection with heterogeneous graph convolutional networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3263–3272.
- C. Zhu, F. Chen, U. Ahmed, Z. Shen, and M. Savvides, “Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 8782–8791.
- A. Wu, Y. Han, L. Zhu, and Y. Yang, “Universal-prototype enhancing for few-shot object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9567–9576.
- G. Zhang, Z. Luo, K. Cui, S. Lu, and E. P. Xing, “Meta-detr: Image-level few-shot detection with inter-class correlation exploitation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- G. Han, J. Ma, S. Huang, L. Chen, and S.-F. Chang, “Few-shot object detection with fully cross-transformer,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 5321–5330.
- T. Kong, A. Yao, Y. Chen, and F. Sun, “HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
- Z. Li and F. Zhou, “FSSD: Feature Fusion Single Shot Multibox DAetector,” arXiv preprint arXiv:1712.00960, 2017.
- Z. Shen, Z. Liu, J. Li, Y.-G. Jiang, Y. Chen, and X. Xue, “DSOD: Learning Deeply Supervised Object Detectors From Scratch,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- https://subtchallenge.com.