Query-guided Prototype Evolution Network for Few-Shot Segmentation (2403.06488v2)
Abstract: Previous Few-Shot Segmentation (FSS) approaches exclusively utilize support features for prototype generation, neglecting the specific requirements of the query. To address this, we present the Query-guided Prototype Evolution Network (QPENet), a new method that integrates query features into the generation process of foreground and background prototypes, thereby yielding customized prototypes attuned to specific queries. The evolution of the foreground prototype is accomplished through a \textit{support-query-support} iterative process involving two new modules: Pseudo-prototype Generation (PPG) and Dual Prototype Evolution (DPE). The PPG module employs support features to create an initial prototype for the preliminary segmentation of the query image, resulting in a pseudo-prototype reflecting the unique needs of the current query. Subsequently, the DPE module performs reverse segmentation on support images using this pseudo-prototype, leading to the generation of evolved prototypes, which can be considered as custom solutions. As for the background prototype, the evolution begins with a global background prototype that represents the generalized features of all training images. We also design a Global Background Cleansing (GBC) module to eliminate potential adverse components mirroring the characteristics of the current foreground class. Experimental results on the PASCAL-$5i$ and COCO-$20i$ datasets attest to the substantial enhancements achieved by QPENet over prevailing state-of-the-art techniques, underscoring the validity of our ideas.
- Z. Tian, H. Zhao, M. Shu, Z. Yang, R. Li, and J. Jia, “Prior guided feature enrichment network for few-shot segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 2, pp. 1050–1065, 2022.
- C. Li, J. Guo, B. Wang, R. Cong, Y. Zhang, and J. Wang, “Single underwater image enhancement based on color cast removal and visibility restoration,” J. Electronic Imaging, vol. 25, no. 3, p. 033012, 2016.
- M. Ni, J. Lei, R. Cong, K. Zheng, B. Peng, and X. Fan, “Color-guided depth map super resolution using convolutional neural network,” IEEE Access, vol. 5, pp. 26 666–26 672, 2017.
- Y. Huang, F. Zheng, R. Cong, W. Huang, M. R. Scott, and L. Shao, “MCMT-GAN: multi-task coherent modality transferable GAN for 3D brain image synthesis,” IEEE Trans. Image Process., vol. 29, pp. 8187–8198, 2020.
- C. Li, C. Guo, W. Ren, R. Cong, J. Hou, S. Kwong, and D. Tao, “An underwater image enhancement benchmark dataset and beyond,” IEEE Trans. Image Process., vol. 29, pp. 4376–4389, 2020.
- G. Yue, W. Han, B. Jiang, T. Zhou, R. Cong, and T. Wang, “Boundary constraint network with cross layer feature integration for polyp segmentation,” IEEE J. Biomed. Health Inform., vol. 26, no. 8, pp. 4090–4099, 2022.
- J. Chen, R. Cong, Y. LUO, H. Ip, and S. Kwong, “Saving 100x storage: Prototype replay for reconstructing training sample distribution in class-incremental semantic segmentation,” in Advances in Neural Information Processing Systems, 2023.
- Q. Tang, R. Cong, R. Sheng, L. He, D. Zhang, Y. Zhao, and S. Kwong, “Bridgenet: A joint learning network of depth map super-resolution and monocular depth estimation,” in Proc. ACM MM, 2021, pp. 2148–2157.
- R. Cong, M. Sun, S. Zhang, X. Zhou, W. Zhang, and Y. Zhao, “Frequency perception network for camouflaged object detection,” in Proc. ACM MM, 2021, pp. 1179–1189.
- R. Cong, Y. Guan, J. Chen, W. Zhang, Y. Zhao, and S. Kwong, “SDDNet: Style-guided dual-layer disentanglement network for shadow detection,” in Proc. ACM MM, 2023, pp. 1202–1211.
- J. Hu, Q. Jiang, R. Cong, W. Gao, and F. Shao, “Two-branch deep neural network for underwater image enhancement in HSV color space,” IEEE Signal Process. Lett., vol. 28, pp. 2152–2156, 2021.
- H. Huang, R. Cong, L. Yang, L. Du, C. Wang, and S. Kwong, “Feedback chain network for hippocampus segmentation,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 19, no. 3s, p. article 133, 2022.
- Q. Wang, C. Yuan, and Y. Liu, “Learning deep conditional neural network for image segmentation,” IEEE Trans. Multim., vol. 21, no. 7, pp. 1839–1852, 2019.
- B. Kang, Y. Lee, and T. Q. Nguyen, “Depth-adaptive deep neural network for semantic segmentation,” IEEE Trans. Multim., vol. 20, no. 9, pp. 2478–2490, 2018.
- C. Zhan, H. Hu, Z. Wang, R. Fan, and D. Niyato, “Unmanned aircraft system aided adaptive video streaming: A joint optimization approach,” IEEE Trans. Multim., vol. 22, no. 3, pp. 795–807, 2019.
- Y. Yao, T. Chen, G. Xie, C. Zhang, F. Shen, Q. Wu, Z. Tang, and J. Zhang, “Non-salient region object mining for weakly supervised semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 2623–2632.
- R. Cong, K. Zhang, C. Zhang, F. Zheng, Y. Zhao, Q. Huang, and S. Kwong, “Does Thermal really always matter for RGB-T salient object detection?” IEEE Trans. Multimedia, early access, doi: 10.1109/TMM.2022.3216476.
- T. Chen, Y. Yao, L. Zhang, Q. Wang, G. Xie, and F. Shen, “Saliency guided inter-and intra-class relation constraints for weakly supervised semantic segmentation,” IEEE Trans. Multim., vol. 25, pp. 1727–1737, 2023.
- S. A. Taghanaki, K. Abhishek, J. P. Cohen, J. Cohen-Adad, and G. Hamarneh, “Deep semantic segmentation of natural and medical images: A review,” Artificial Intelligence Review, vol. 54, no. 1, pp. 137–178, 2021.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017.
- R. Cong, Y. Zhang, N. Yang, H. Li, X. Zhang, R. Li, Z. Chen, Y. Zhao, and S. Kwong, “Boundary guided semantic larning for real-time COVID-19 lung infection segmentation system,” IEEE Trans. Consum. Electron., vol. 68, no. 4, pp. 376–386, 2022.
- G. Zhao, Y. Zhang, M. Ge, and M. Yu, “Bilateral u-net semantic segmentation with spatial attention mechanism,” CAAI Trans. Intell. Technol., vol. 8, no. 2, pp. 297–307, 2023.
- R. Cong, K. Huang, J. Lei, Y. Zhao, Q. Huang, and S. Kwong, “Multi-projection fusion and refinement network for salient object detection in 360 ∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT omnidirectional image,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–13, 2023.
- C. Yin, J. Tang, T. Yuan, Z. Xu, and Y. Wang, “Bridging the gap between semantic segmentation and instance segmentation,” IEEE Trans. Multim., vol. 24, pp. 4183–4196, 2022.
- J. Chen, R. Cong, H. H. S. Ip, and S. Kwong, “Kepsalinst: Using peripheral points to delineate salient instances,” IEEE Trans. Cybern., pp. 1–14, 2023.
- P. Wen, R. Yang, Q. Xu, C. Qian, Q. Huang, R. Cong, and J. Si, “DMVOS: Discriminative matching for real-time video object segmentation,” in Proc. ACM MM, 2020, pp. 2048–2056.
- H. Liu, Y. Guo, Y. Ma, Y. Lei, and G. Wen, “Semantic context encoding for accurate 3D point cloud segmentation,” IEEE Trans. Multim., vol. 23, pp. 2045–2055, 2021.
- R. Cong, H. Yang, Q. Jiang, W. Gao, H. Li, C. Wang, Y. Zhao, and S. Kwong, “BCS-Net: Boundary, context, and semantic for automatic COVID-19 lung infection segmentation from CT images,” IEEE Trans. Instrum. Meas., vol. 71, pp. 1–11, 2022.
- T. Wang, Z. Ji, Q. Sun, Q. Chen, and X. Jing, “Interactive multilabel image segmentation via robust multilayer graph constraints,” IEEE Trans. Multim., vol. 18, no. 12, pp. 2358–2371, 2016.
- R. Cong, N. Yang, C. Li, H. Fu, Y. Zhao, Q. Huang, and S. Kwong, “Global-and-local collaborative learning for co-salient object detection,” IEEE Trans. Cybern., vol. 53, no. 3, pp. 1920–1931, 2023.
- L. Zhou, C. Gong, Z. Liu, and K. Fu, “SAL: Selection and attention losses for weakly supervised semantic segmentation,” IEEE Trans. Multim., vol. 23, pp. 1035–1048, 2021.
- T. Zhang, G. Lin, J. Cai, T. Shen, C. Shen, and A. C. Kot, “Decoupled spatial neural attention for weakly supervised semantic segmentation,” IEEE Trans. Multim., vol. 21, no. 11, pp. 2930–2941, 2019.
- R. Cong, H. Liu, C. Zhang, W. Zhang, F. Zheng, R. Song, and S. Kwong, “Point-aware interaction and CNN-induced refinement network for RGB-D salient object detection,” in Proc. ACM MM, 2023, pp. 406–416.
- X. Zheng, Y. Zhang, Y. Zheng, F. Luo, and X. Lu, “Abnormal event detection by a weakly supervised temporal attention network,” CAAI Trans. Intell. Technol., vol. 7, no. 3, pp. 419–431, 2022.
- R. Cong, Q. Qin, C. Zhang, Q. Jiang, S. Wang, Y. Zhao, and S. Kwong, “A weakly supervised learning framework for salient object detection via hybrid labels,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 2, pp. 534–548, 2023.
- O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra, “Matching networks for one shot learning,” in Advances in Neural Information Processing Systems, 2016, pp. 3630–3638.
- X. Zhang, Y. Wei, Y. Yang, and T. S. Huang, “SG-One: Similarity guidance network for one-shot semantic segmentation,” IEEE Trans. Cybern., vol. 50, no. 9, pp. 3855–3865, 2020.
- B. Yang, C. Liu, B. Li, J. Jiao, and Q. Ye, “Prototype mixture models for few-shot semantic segmentation,” in Proceedings of the European Conference on Computer Vision, vol. 12353, 2020, pp. 763–778.
- Y. Liu, X. Zhang, S. Zhang, and X. He, “Part-aware prototype network for few-shot semantic segmentation,” in Proceedings of the European Conference on Computer Vision, vol. 12354, 2020, pp. 142–158.
- G. Li, V. Jampani, L. Sevilla-Lara, D. Sun, J. Kim, and J. Kim, “Adaptive prototype learning and allocation for few-shot segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 8334–8343.
- B. Zhang, J. Xiao, and T. Qin, “Self-guided and cross-guided learning for few-shot segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8312–8321.
- Y. Liu, N. Liu, Q. Cao, X. Yao, J. Han, and L. Shao, “Learning non-target knowledge for few-shot semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 573–11 582.
- K. Wang, J. H. Liew, Y. Zou, D. Zhou, and J. Feng, “PANet: Few-shot image semantic segmentation with prototype alignment,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9196–9205.
- G. Xie, H. Xiong, J. Liu, Y. Yao, and L. Shao, “Few-shot semantic segmentation with cyclic memory network,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 7273–7282.
- G. Xie, J. Liu, H. Xiong, and L. Shao, “Scale-aware graph neural network for few-shot semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 5475–5484.
- T. Chen, G. Xie, Y. Yao, Q. Wang, F. Shen, Z. Tang, and J. Zhang, “Semantically meaningful class prototype learning for one-shot image segmentation,” IEEE Trans. Multim., vol. 24, pp. 968–980, 2022.
- X. Zhang, Y. Wei, Z. Li, C. Yan, and Y. Yang, “Rich embedding features for one-shot semantic segmentation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 11, pp. 6484–6493, 2021.
- N. Dong and E. P. Xing, “Few-shot semantic segmentation with prototype learning,” in British Machine Vision Conference, 2018, pp. 1–13.
- C. Lang, B. Tu, G. Cheng, and J. Han, “Beyond the prototype: Divide-and-conquer proxies for few-shot segmentation,” in Proceedings of the International Joint Conference on Artificial Intelligence, 2022, pp. 1024–1030.
- J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440.
- L. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European Conference on Computer Vision, vol. 11211, 2018, pp. 833–851.
- H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6230–6239.
- J. He, Z. Deng, L. Zhou, Y. Wang, and Y. Qiao, “Adaptive pyramid context network for semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7519–7528.
- R. Cong, Q. Lin, C. Zhang, C. Li, X. Cao, Q. Huang, and Y. Zhao, “CIR-Net: Cross-modality interaction and refinement for RGB-D salient object detection,” IEEE Trans. Image Process., vol. 31, pp. 6800–6815, 2022.
- L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected CRFs,” in International Conference on Learning Representations, 2015.
- X. Wang, R. B. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7794–7803.
- Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, “CCNet: Criss-cross attention for semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 603–612.
- L. Zhang, D. Xu, A. Arnab, and P. H. S. Torr, “Dynamic graph message passing networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3723–3732.
- B. Cheng, A. G. Schwing, and A. Kirillov, “Per-pixel classification is not all you need for semantic segmentation,” in Advances in Neural Information Processing Systems, 2021, pp. 17 864–17 875.
- B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1280–1289.
- Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few examples: A survey on few-shot learning,” ACM Comput. Surv., vol. 53, no. 3, pp. 63:1–63:34, 2021.
- S. Ravi and H. Larochelle, “Optimization as a model for few-shot learning,” in International Conference on Learning Representations, 2017.
- F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. S. Torr, and T. M. Hospedales, “Learning to compare: Relation network for few-shot learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1199–1208.
- J. Snell, K. Swersky, and R. S. Zemel, “Prototypical networks for few-shot learning,” in Advances in Neural Information Processing Systems, 2017, pp. 4077–4087.
- Q. Sun, Y. Liu, T. Chua, and B. Schiele, “Meta-transfer learning for few-shot learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 403–412.
- B. N. Oreshkin, P. R. López, and A. Lacoste, “TADAM: Task dependent adaptive metric for improved few-shot learning,” in Advances in Neural Information Processing Systems, 2018, pp. 719–729.
- K. R. Allen, E. Shelhamer, H. Shin, and J. B. Tenenbaum, “Infinite mixture prototypes for few-shot learning,” in Proceedings of the IEEE International Conference on Machine Learning, vol. 97, 2019, pp. 232–241.
- G. Koch, R. Zemel, R. Salakhutdinov et al., “Siamese neural networks for one-shot image recognition,” in Proceedings of the IEEE International Conference on Machine Learning Workshop, 2015.
- M. A. Jamal and G. Qi, “Task agnostic meta-learning for few-shot learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 719–11 727.
- C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proceedings of the IEEE International Conference on Machine Learning, vol. 70, 2017, pp. 1126–1135.
- J. Gordon, J. Bronskill, M. Bauer, S. Nowozin, and R. E. Turner, “Meta-learning probabilistic inference for prediction,” in International Conference on Learning Representations, 2019.
- E. Grant, C. Finn, S. Levine, T. Darrell, and T. L. Griffiths, “Recasting gradient-based meta-learning as hierarchical bayes,” in International Conference on Learning Representations, 2018.
- Z. Chen, Y. Fu, K. Chen, and Y. Jiang, “Image block augmentation for one-shot learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, pp. 3379–3386.
- A. Shaban, S. Bansal, Z. Liu, I. Essa, and B. Boots, “One-shot learning for semantic segmentation,” in British Machine Vision Conference, 2017, pp. 1–13.
- C. Zhang, G. Lin, F. Liu, R. Yao, and C. Shen, “CANet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5217–5226.
- M. Boudiaf, H. Kervadec, I. M. Ziko, P. Piantanida, I. B. Ayed, and J. Dolz, “Few-shot segmentation without meta-learning: A good transductive inference is all you need?” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13 979–13 988.
- G. Zhang, G. Kang, Y. Yang, and Y. Wei, “Few-shot segmentation via cycle-consistent transformer,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 21 984–21 996.
- B. Mao, X. Zhang, L. Wang, Q. Zhang, S. Xiang, and C. Pan, “Learning from the target: Dual prototype network for few shot semantic segmentation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 1953–1961.
- H. Wang, X. Zhang, Y. Hu, Y. Yang, X. Cao, and X. Zhen, “Few-shot semantic segmentation with democratic attention networks,” in Proceedings of the European Conference on Computer Vision, vol. 12358, 2020, pp. 730–746.
- R. Hou, H. Chang, B. Ma, S. Shan, and X. Chen, “Cross attention network for few-shot classification,” in Advances in Neural Information Processing Systems, 2019, pp. 4005–4016.
- J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3146–3154.
- Z. Zhu, M. Xu, S. Bai, T. Huang, and X. Bai, “Asymmetric non-local neural networks for semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 593–602.
- K. Nguyen and S. Todorovic, “Feature weighting and boosting for few-shot segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 622–631.
- M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, and A. Zisserman, “The pascal visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, 2010.
- B. Hariharan, P. Arbelaez, L. D. Bourdev, S. Maji, and J. Malik, “Semantic contours from inverse detectors,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2011, pp. 991–998.
- T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in Proceedings of the European Conference on Computer Vision, 2014, pp. 740–755.
- L. Yang, W. Zhuo, L. Qi, Y. Shi, and Y. Gao, “Mining latent classes for few-shot segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 8701–8710.
- J. Liu, Y. Bao, G. Xie, H. Xiong, J. Sonke, and E. Gavves, “Dynamic prototype convolution network for few-shot semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 553–11 562.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Z. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “PyTorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems, 2019, pp. 8024–8035.
- Q. Fan, W. Pei, Y. Tai, and C. Tang, “Self-support few-shot semantic segmentation,” in Proceedings of the European Conference on Computer Vision, vol. 13679, 2022, pp. 701–719.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and L. Fei-Fei, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
- Y. Liu, N. Liu, X. Yao, and J. Han, “Intermediate prototype mining transformer for few-shot semantic segmentation,” in Advances in Neural Information Processing Systems, 2022, pp. 38 020–38 031.