Few-shot Object Localization (2403.12466v3)
Abstract: Existing object localization methods are tailored to locate specific classes of objects, relying heavily on abundant labeled data for model optimization. However, acquiring large amounts of labeled data is challenging in many real-world scenarios, significantly limiting the broader application of localization models. To bridge this research gap, this paper defines a novel task named Few-Shot Object Localization (FSOL), which aims to achieve precise localization with limited samples. This task achieves generalized object localization by leveraging a small number of labeled support samples to query the positional information of objects within corresponding images. To advance this field, we design an innovative high-performance baseline model. This model integrates a dual-path feature augmentation module to enhance shape association and gradient differences between supports and query images, alongside a self query module to explore the association between feature maps and query images. Experimental results demonstrate a significant performance improvement of our approach in the FSOL task, establishing an efficient benchmark for further research. All codes and data are available at https://github.com/Ryh1218/FSOL.
- Z. Zheng, R. Ye, Q. Hou, D. Ren, P. Wang, W. Zuo, and M.-M. Cheng, “Localization distillation for object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- K. Kim and H. S. Lee, “Probabilistic anchor assignment with iou prediction for object detection,” in Proceedings of the European Conference on Computer Vision. Springer, 2020, pp. 355–371.
- X. Yang, J. Yan, Q. Ming, W. Wang, X. Zhang, and Q. Tian, “Rethinking rotated object detection with gaussian wasserstein distance loss,” in Proceedings of the International Conference on Machine Learning. PMLR, 2021, pp. 11 830–11 841.
- D. Feng, C. Haase-Schütz, L. Rosenbaum, H. Hertlein, C. Glaeser, F. Timm, W. Wiesbeck, and K. Dietmayer, “Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 3, pp. 1341–1360, 2020.
- I. Ahmed, S. Din, G. Jeon, F. Piccialli, and G. Fortino, “Towards collaborative robotics in top view surveillance: A framework for multiple object tracking by detection using deep learning,” IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 7, pp. 1253–1270, 2021.
- B. Li, Y. Zhang, C. Zhang, X. Piao, Y. Hu, and B. Yin, “Multi-scale hypergraph-based feature alignment network for cell localization,” Pattern Recognition, vol. 149, p. 110260, 2024.
- Y. Zhao, K. Zeng, Y. Zhao, P. Bhatia, M. Ranganath, M. L. Kozhikkavil, C. Li, and G. Hermosillo, “Deep learning solution for medical image localization and orientation detection,” Medical Image Analysis, vol. 81, Oct. 2022.
- B. Li, Y. Zhang, C. Zhang, X. Piao, and B. Yin, “Hypergraph association weakly supervised crowd counting,” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 19, no. 6, pp. 1–20, 2023.
- Y. Song, T. Wang, P. Cai, S. K. Mondal, and J. P. Sahoo, “A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challenges, and Opportunities,” ACM Computing Surveys, vol. 55, no. 13s, pp. 1–40, Dec. 2023.
- D. Wertheimer, L. Tang, and B. Hariharan, “Few-Shot Classification with Feature Map Reconstruction Networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8008–8017.
- Q. Fan, W. Zhuo, C.-K. Tang, and Y.-W. Tai, “Few-Shot Object Detection With Attention-RPN and Multi-Relation Detector,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 4012–4021.
- J. He, B. Liu, F. Cao, J. Xu, and Y. Xiao, “Few-Shot Object Counting with Dynamic Similarity-Aware in Latent Space,” IEEE Transactions on Geoscience and Remote Sensing, 2024.
- J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 764–773.
- X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9308–9316.
- Z. Yu, Y. Qin, H. Zhao, X. Li, and G. Zhao, “Dual-cross central difference network for face anti-spoofing,” in Proceedings of the International Joint Conference on Artificial Intelligence, 2021.
- Z. You, K. Yang, W. Luo, X. Lu, L. Cui, and X. Le, “Few-shot object counting with similarity-aware feature enhancement,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 6315–6324.
- Q. Fan, W. Pei, Y.-W. Tai, and C.-K. Tang, “Self-support few-shot semantic segmentation,” in Proceedings of the European Conference on Computer Vision. Springer, 2022, pp. 701–719.
- D. Liang, W. Xu, Y. Zhu, and Y. Zhou, “Focal inverse distance transform maps for crowd localization,” IEEE Transactions on Multimedia, 2022.
- B. Li, J. Chen, H. Yi, M. Feng, Y. Yang, Q. Zhu, and H. Bu, “Exponential distance transform maps for cell localization,” Engineering Applications of Artificial Intelligence, vol. 132, p. 107948, 2024.
- B. Li, Y. Zhang, Y. Ren, C. Zhang, and B. Yin, “Lite-unet: A lightweight and efficient network for cell localization,” Engineering Applications of Artificial Intelligence, vol. 129, p. 107634, 2024.
- Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-Image Crowd Counting via Multi-Column Convolutional Neural Network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 589–597.
- Y. Li, X. Zhang, and D. Chen, “Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 1091–1100.
- D. Kang, Z. Ma, and A. B. Chan, “Beyond counting: comparisons of density maps for crowd analysis tasks—counting, detection, and tracking,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 5, pp. 1408–1422, 2018.
- G. Olmschenk, H. Tang, and Z. Zhu, “Improving dense crowd counting convolutional neural networks using inverse k-nearest neighbor maps and multiscale upsampling,” arXiv preprint arXiv:1902.05379, 2019.
- Z. Chi, Z. Wang, M. Yang, D. Li, and W. Du, “Learning to capture the query distribution for few-shot learning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 7, pp. 4163–4173, 2021.
- W. Jiang, K. Huang, J. Geng, and X. Deng, “Multi-scale metric learning for few-shot learning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 3, pp. 1091–1102, 2020.
- S. X. Hu, D. Li, J. Stuhmer, M. Kim, and T. M. Hospedales, “Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9058–9067.
- F. Liu, X. Zhang, Z. Peng, Z. Guo, F. Wan, X. Ji, and Q. Ye, “Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 6825–6834.
- T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117–2125.
- Z. Lu, S. He, X. Zhu, L. Zhang, Y.-Z. Song, and T. Xiang, “Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 8721–8730.
- G. Zhang, G. Kang, Y. Yang, and Y. Wei, “Few-Shot Segmentation via Cycle-Consistent Transformer,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 21 984–21 996.
- Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few examples: A survey on few-shot learning,” ACM computing surveys, vol. 53, no. 3, pp. 1–34, 2020.
- C. Zhang, Y. Zhang, B. Li, X. Piao, and B. Yin, “Crowdgraph: Weakly supervised crowd counting via pure graph neural network,” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 20, no. 5, pp. 1–23, 2024.
- B. Li, Y. Zhang, H. Xu, and B. Yin, “Ccst: crowd counting with swin transformer,” The Visual Computer, vol. 39, no. 7, pp. 2671–2682, 2023.
- E. Lu, W. Xie, and A. Zisserman, “Class-agnostic counting,” in Proceedings of the Asian Conference on Computer Vision. Springer, 2019, pp. 669–684.
- S.-D. Yang, H.-T. Su, W. H. Hsu, and W.-C. Chen, “Class-agnostic Few-shot Object Counting,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 869–877.
- V. Ranjan, U. Sharma, T. Nguyen, and M. Hoai, “Learning to count everything,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3394–3403.
- X. Chen, X. Wang, K. Zhang, K.-M. Fung, T. C. Thai, K. Moore, R. S. Mannel, H. Liu, B. Zheng, and Y. Qiu, “Recent advances and clinical applications of deep learning in medical image analysis,” Medical Image Analysis, vol. 79, Jul. 2022.
- J. Liu and G. Guo, “Vehicle localization during gps outages with extended kalman filter and deep learning,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–10, 2021.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Processing Systems, vol. 28, 2015.
- M.-R. Hsieh, Y.-L. Lin, and W. H. Hsu, “Drone-Based Object Counting by Spatially Regularized Regional Proposal Network,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 4165–4173.
- D. Liang, W. Xu, and X. Bai, “An end-to-end transformer model for crowd localization,” in Proceedings of the European Conference on Computer Vision. Springer, 2022, pp. 38–54.
- J. Wang, J. Gao, Y. Yuan, and Q. Wang, “Crowd localization from gaussian mixture scoped knowledge and scoped teacher,” IEEE Transactions on Image Processing, vol. 32, pp. 1802–1814, 2023.
- J. Gao, T. Han, Q. Wang, Y. Yuan, and X. Li, “Learning independent instance maps for crowd localization,” arXiv preprint arXiv:2012.04164, 2020.
- T. Han, L. Bai, L. Liu, and W. Ouyang, “Steerer: Resolving scale variations for counting and localization via selective inheritance learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 21 848–21 859.
- A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” arXiv preprint arXiv:2312.00752, 2023.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.