Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Detail Reinforcement Diffusion Model: Augmentation Fine-Grained Visual Categorization in Few-Shot Conditions (2309.08097v2)

Published 15 Sep 2023 in cs.CV

Abstract: The challenge in fine-grained visual categorization lies in how to explore the subtle differences between different subclasses and achieve accurate discrimination. Previous research has relied on large-scale annotated data and pre-trained deep models to achieve the objective. However, when only a limited amount of samples is available, similar methods may become less effective. Diffusion models have been widely adopted in data augmentation due to their outstanding diversity in data generation. However, the high level of detail required for fine-grained images makes it challenging for existing methods to be directly employed. To address this issue, we propose a novel approach termed the detail reinforcement diffusion model~(DRDM), which leverages the rich knowledge of large models for fine-grained data augmentation and comprises two key components including discriminative semantic recombination (DSR) and spatial knowledge reference~(SKR). Specifically, DSR is designed to extract implicit similarity relationships from the labels and reconstruct the semantic mapping between labels and instances, which enables better discrimination of subtle differences between different subclasses. Furthermore, we introduce the SKR module, which incorporates the distributions of different datasets as references in the feature space. This allows the SKR to aggregate the high-dimensional distribution of subclass features in few-shot FGVC tasks, thus expanding the decision boundary. Through these two critical components, we effectively utilize the knowledge from large models to address the issue of data scarcity, resulting in improved performance for fine-grained visual recognition tasks. Extensive experiments demonstrate the consistent performance gain offered by our DRDM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The caltech-ucsd birds-200-2011 dataset.”   California Institute of Technology, 2011.
  2. J. Mańdziuk, “New shades of the vehicle routing problem: Emerging problem formulations and computational intelligence solution methods,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 3, no. 3, pp. 230–244, 2019.
  3. K. Sadeghi, A. Banerjee, and S. K. Gupta, “A system-driven taxonomy of attacks and defenses in adversarial machine learning,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 4, no. 4, pp. 450–467, 2020.
  4. J. Yi, H. Zhang, J. Mao, Y. Chen, H. Zhong, and Y. Wang, “Pharmaceutical foreign particle detection: An efficient method based on adaptive convolution and multiscale attention,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 6, no. 6, pp. 1302–1313, 2022.
  5. J. Du, K. Guan, Y. Zhou, Y. Li, and T. Wang, “Parameter-free similarity-aware attention module for medical image classification and segmentation,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 7, no. 3, pp. 845–857, 2023.
  6. S. Ye, Y. Wang, Q. Peng, X. You, and C. P. Chen, “The image data and backbone in weakly supervised fine-grained visual categorization: A revisit and further thinking,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
  7. X. He, Y. Peng, and J. Zhao, “Stackdrl: Stacked deep reinforcement learning for fine-grained visual categorization.” in IJCAI, 2018, pp. 741–747.
  8. X. Zheng, L. Qi, Y. Ren, and X. Lu, “Fine-grained visual categorization by localizing object parts with single image,” IEEE Transactions on Multimedia, vol. 23, pp. 1187–1199, 2020.
  9. Y. Ding, Y. Zhou, Y. Zhu, Q. Ye, and J. Jiao, “Selective sparse sampling for fine-grained image recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6599–6608.
  10. S. Ye, Q. Peng, W. Sun, J. Xu, Y. Wang, X. You, and Y.-M. Cheung, “Discriminative suprasphere embedding for fine-grained visual categorization,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
  11. Y. Wang, S. Ye, S. Yu, and X. You, “R2-trans: Fine-grained visual categorization with redundancy reduction,” arXiv preprint arXiv:2204.10095, 2022.
  12. Z. Hong, S. Chen, G. Xie, W. Yang, J. Zhao, Y. Shao, Q. Peng, and X. You, “Semantic compression embedding for generative zero-shot learning,” IJCAI, Vienna, Austria, vol. 7, pp. 956–963, 2022.
  13. Y. Shu, B. Yu, H. Xu, and L. Liu, “Improving fine-grained visual recognition in low data regimes via self-boosting attention mechanism,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXV.   Springer, 2022, pp. 449–465.
  14. Y. Wang, X. Pei, and H. Zhan, “Fine-grained graph learning for multi-view subspace clustering,” IEEE Transactions on Emerging Topics in Computational Intelligence, pp. 1–12, 2023.
  15. K.-Y. Feng, M. Gong, K. Pan, H. Zhao, Y. Wu, and K. Sheng, “Model sparsification for communication-efficient multi-party learning via contrastive distillation in image classification,” IEEE Transactions on Emerging Topics in Computational Intelligence, pp. 1–14, 2023.
  16. K. Li, Y. Zhang, K. Li, and Y. Fu, “Adversarial feature hallucination networks for few-shot learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13 470–13 479.
  17. H. Le and D. Samaras, “Shadow removal via shadow image decomposition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8578–8587.
  18. R. Zhang, T. Che, Z. Ghahramani, Y. Bengio, and Y. Song, “Metagan: An adversarial approach to few-shot learning,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  19. H. Gao, Z. Shou, A. Zareian, H. Zhang, and S.-F. Chang, “Low-shot learning via covariance-preserving adversarial augmentation networks,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  20. S. Tsutsui, Y. Fu, and D. Crandall, “Meta-reinforced synthetic data for one-shot fine-grained visual recognition,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  21. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695.
  22. W. Li, Z. Wang, X. Yang, C. Dong, P. Tian, T. Qin, J. Huo, Y. Shi, L. Wang, Y. Gao et al., “Libfewshot: A comprehensive library for few-shot learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  23. S. Yan, N. Dong, L. Zhang, and J. Tang, “Clip-driven fine-grained text-image person re-identification,” IEEE Transactions on Image Processing, 2023.
  24. Z. Xin, S. Chen, T. Wu, Y. Shao, W. Ding, and X. You, “Few-shot object detection: Research advances and challenges,” Information Fusion, p. 102307, 2024.
  25. J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  26. F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, “Learning to compare: Relation network for few-shot learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1199–1208.
  27. L. Tang, D. Wertheimer, and B. Hariharan, “Revisiting pose-normalization for fine-grained few-shot recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 14 352–14 361.
  28. W.-Y. Chen, Y.-C. Liu, Z. Kira, Y.-C. F. Wang, and J.-B. Huang, “A closer look at few-shot classification,” arXiv preprint arXiv:1904.04232, 2019.
  29. Y. Tian, Y. Wang, D. Krishnan, J. B. Tenenbaum, and P. Isola, “Rethinking few-shot image classification: a good embedding is all you need?” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16.   Springer, 2020, pp. 266–282.
  30. G. S. Dhillon, P. Chaudhari, A. Ravichandran, and S. Soatto, “A baseline for few-shot image classification,” arXiv preprint arXiv:1909.02729, 2019.
  31. X. Yin, X. Yu, K. Sohn, X. Liu, and M. Chandraker, “Feature transfer learning for deep face recognition with under-represented data,” arXiv preprint arXiv:1803.09014, 2018.
  32. B. Hariharan and R. Girshick, “Low-shot visual recognition by shrinking and hallucinating features,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3018–3027.
  33. E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, A. Kumar, R. Feris, R. Giryes, and A. Bronstein, “Delta-encoder: an effective sample synthesis method for few-shot object recognition,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  34. J. Xu, H. Le, M. Huang, S. Athar, and D. Samaras, “Variational feature disentangling for fine-grained few-shot classification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 8812–8821.
  35. B. Zhang, X. Li, Y. Ye, Z. Huang, and L. Zhang, “Prototype completion with primitive knowledge for few-shot learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3754–3762.
  36. S. Lee, W. Moon, and J.-P. Heo, “Task discrepancy maximization for fine-grained few-shot classification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5331–5340.
  37. “Coping with change: Learning invariant and minimum sufficient representations for fine-grained visual categorization,” Computer Vision and Image Understanding, vol. 237, p. 103837, 2023.
  38. Z. Hong, Z. Wang, L. Shen, Y. Yao, Z. Huang, S. Chen, C. Yang, M. Gong, and T. Liu, “Improving non-transferable representation learning by harnessing content and style,” in The Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=FYKVPOHCpE
  39. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention.   Springer, 2015, pp. 234–241.
  40. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  41. X. Liu, Z. Hu, H. Ling, and Y.-m. Cheung, “Mtfh: A matrix tri-factorization hashing framework for efficient cross-modal retrieval,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 3, pp. 964–981, 2019.
  42. X. Liu, X. Wang, and Y.-m. Cheung, “Fddh: Fast discriminative discrete hashing for large-scale cross-modal retrieval,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 11, pp. 6306–6320, 2021.
  43. S.-J. Peng, Y. Fan, Y.-m. Cheung, X. Liu, Z. Cui, and T. Li, “Towards efficient cross-modal anomaly detection using triple-adaptive network and bi-quintuple contrastive learning,” IEEE Transactions on Emerging Topics in Computational Intelligence, 2023.
  44. G. Baykal, H. F. Karagoz, T. Binhuraib, and G. Unal, “Protodiffusion: Classifier-free diffusion guidance with prototype learning,” arXiv preprint arXiv:2307.01924, 2023.
  45. J. Pfeiffer, A. Kamath, A. Rücklé, K. Cho, and I. Gurevych, “Adapterfusion: Non-destructive task composition for transfer learning,” arXiv preprint arXiv:2005.00247, 2020.
  46. A. Rücklé, G. Geigle, M. Glockner, T. Beck, J. Pfeiffer, N. Reimers, and I. Gurevych, “Adapterdrop: On the efficiency of adapters in transformers,” arXiv preprint arXiv:2010.11918, 2020.
  47. R. Wang, D. Tang, N. Duan, Z. Wei, X. Huang, G. Cao, D. Jiang, M. Zhou et al., “K-adapter: Infusing knowledge into pre-trained models with adapters,” arXiv preprint arXiv:2002.01808, 2020.
  48. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning.   PMLR, 2021, pp. 8748–8763.
  49. X. Wang, X. Han, W. Huang, D. Dong, and M. R. Scott, “Multi-similarity loss with general pair weighting for deep metric learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5022–5030.
  50. A. Khosla, N. Jayadevaprakash, B. Yao, and F.-F. Li, “Novel dataset for fine-grained image categorization: Stanford dogs,” in Proc. CVPR workshop on fine-grained visual categorization (FGVC), vol. 2, no. 1.   Citeseer, 2011.
  51. J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object representations for fine-grained categorization,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2013, pp. 554–561.
  52. D. Wertheimer, L. Tang, and B. Hariharan, “Few-shot classification with feature map reconstruction networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8012–8021.
  53. W. Li, L. Wang, J. Xu, J. Huo, Y. Gao, and J. Luo, “Revisiting local descriptor based image-to-class measure for few-shot learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7260–7268.
  54. G. Van Horn, S. Branson, R. Farrell, S. Haber, J. Barry, P. Ipeirotis, P. Perona, and S. Belongie, “Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 595–604.
  55. O. M. Parkhi, A. Vedaldi, A. Zisserman, and C. Jawahar, “Cats and dogs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.   IEEE, 2012, pp. 3498–3505.
  56. L. Yang, P. Luo, C. Change Loy, and X. Tang, “A large-scale car dataset for fine-grained categorization and verification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3973–3981.
  57. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  58. Q. Sun, Y. Liu, T.-S. Chua, and B. Schiele, “Meta-transfer learning for few-shot learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 403–412.
  59. K. Lee, S. Maji, A. Ravichandran, and S. Soatto, “Meta-learning with differentiable convex optimization,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 10 657–10 665.
  60. P. Mangla, N. Kumari, A. Sinha, M. Singh, B. Krishnamurthy, and V. N. Balasubramanian, “Charting the right manifold: Manifold mixup for few-shot learning,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2020, pp. 2218–2227.
  61. B. Liu, Y. Cao, Y. Lin, Q. Li, Z. Zhang, M. Long, and H. Hu, “Negative margin matters: Understanding margin in few-shot classification,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16.   Springer, 2020, pp. 438–455.
  62. A. Afrasiyabi, J.-F. Lalonde, and C. Gagné, “Associative alignment for few-shot image classification,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16.   Springer, 2020, pp. 18–35.
  63. Y. Zhu, C. Liu, and S. Jiang, “Multi-attention meta learning for few-shot fine-grained image recognition.” in IJCAI, 2020, pp. 1090–1096.
  64. X. Li, J. Wu, Z. Sun, Z. Ma, J. Cao, and J.-H. Xue, “Bsnet: Bi-similarity network for few-shot fine-grained image classification,” IEEE Transactions on Image Processing, vol. 30, pp. 1318–1331, 2020.
  65. H. Tang, C. Yuan, Z. Li, and J. Tang, “Learning attention-guided pyramidal features for few-shot fine-grained recognition,” Pattern Recognition, vol. 130, p. 108792, 2022.
  66. P. Li, G. Zhao, and X. Xu, “Coarse-to-fine few-shot classification with deep metric learning,” Information Sciences, vol. 610, pp. 592–604, 2022.
  67. B. Munjal, A. Flaborea, S. Amin, F. Tombari, and F. Galasso, “Query-guided networks for few-shot fine-grained classification and person search,” Pattern Recognition, vol. 133, p. 109049, 2023.
  68. N. Sun and P. Yang, “T2l: Trans-transfer learning for few-shot fine-grained visual categorization with extended adaptation,” Knowledge-Based Systems, vol. 264, p. 110329, 2023.
  69. M.-H. Pan, H.-Y. Xin, C.-Q. Xia, and H.-B. Shen, “Few-shot classification with task-adaptive semantic feature learning,” Pattern Recognition, vol. 141, p. 109594, 2023.
  70. S. Gu, D. Chen, J. Bao, F. Wen, B. Zhang, D. Chen, L. Yuan, and B. Guo, “Vector quantized diffusion model for text-to-image synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10 696–10 706.
  71. L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of Machine Learning Research, vol. 9, no. 11, 2008.
  72. Y. Tian, X. Yu, B. Fan, F. Wu, H. Heijnen, and V. Balntas, “Sosnet: Second order similarity regularization for local descriptor learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 016–11 025.
Citations (1)

Summary

We haven't generated a summary for this paper yet.