Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
164 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Categorical Knowledge Fused Recognition: Fusing Hierarchical Knowledge with Image Classification through Aligning and Deep Metric Learning (2407.20600v2)

Published 30 Jul 2024 in cs.CV

Abstract: Image classification is a fundamental computer vision task and an important baseline for deep metric learning. In decades efforts have been made on enhancing image classification accuracy by using deep learning models while less attention has been paid on the reasoning aspect of the recognition, i.e., predictions could be made because of background or other surrounding objects rather than the target object. Hierarchical knowledge about image categories depicts inter-class similarities or dissimilarities. Effective fusion of such knowledge with deep learning image classification models is promising in improving target object identification and enhancing the reasoning aspect of the recognition. In this paper, we propose a novel deep metric learning based method to effectively fuse prior knowledge about image categories with mainstream backbone image classification models and enhance the reasoning aspect of the recognition in an end-to-end manner. Existing deep metric learning incorporated image classification methods mainly focus on whether sampled images are from the same class. A new triplet loss function term that aligns distances in the model latent space with those in knowledge space is presented and incorporated in the proposed method to facilitate the dual-modality fusion. Extensive experiments on the CIFAR-10, CIFAR-100, Mini-ImageNet, and ImageNet-1K datasets evaluated the proposed method, and results indicate that the proposed method is effective in enhancing the reasoning aspect of image recognition in terms of weakly-supervised object localization performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. A. Zhai and H.-Y. Wu, “Classification is a strong baseline for deep metric learning,” in British Machine Vision Conference, 2019.
  2. J. Xie, J. Xiang, J. Chen, X. Hou, X. Zhao, and L. Shen, “C2 am: Contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 979–988.
  3. M. Sun, W. Huang, and S. Savarese, “Find the best path: An efficient and accurate classifier for image hierarchies,” in IEEE International Conference on Computer Vision, 2013, pp. 265–272.
  4. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  5. A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, no. 2, 2012.
  6. Y. Lecun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and L. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, no. 4, pp. 541–551, 1989.
  7. H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang, “Cvt: Introducing convolutions to vision transformers,” in IEEE International Conference on Computer Vision, 2021, pp. 22–31.
  8. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
  9. O. Russakovsky et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
  10. B. Liu, R. Li, and J. Feng, “A brief introduction to deep metric learning,” CAAI Transactions on Intelligent Systems, vol. 14, no. 6, pp. 1064–1072, 2019.
  11. K. Song, J. Han, G. Cheng, J. Lu, and F. Nie, “Adaptive neighborhood metric learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4591–4604, 2022.
  12. M. Kaya and H. Bilge, “Deep metric learning: A survey,” Symmetry, vol. 11, no. 9, 2019.
  13. J. Deng, A. C. Berg, and L. Fei-Fei, “Hierarchical semantic indexing for large scale image retrieval,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 785–792.
  14. W. Zheng, Y. Huang, B. Zhang, J. Zhou, and J. Lu, “Dynamic metric learning with cross-level concept distillation,” in European Conference on Computer Vision.   Springer Nature Switzerland, 2022, pp. 197–213.
  15. A. Bellet, A. Habrard, and M. Sebban, “Metric learning,” vol. 30.   Springer Science and Business Media LLC, 2015, pp. 1–151.
  16. Y. Qu, L. Lin, F. Shen, C. Lu, Y. Wu, Y. Xie, and D. Tao, “Joint hierarchical category structure learning and large-scale image classification,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4331–4346, 2017.
  17. X. Ma, H. Wang, Y. Liu, S. Ji, Q. Gao, and J. Wang, “Knowledge guided classification of hyperspectral image based on hierarchical class tree,” in IEEE International Geoscience and Remote Sensing Symposium, 2019, pp. 2702–2705.
  18. Y. Zheng, J. Fan, J. Zhang, and X. Gao, “Exploiting related and unrelated tasks for hierarchical metric learning and image classification,” IEEE Transactions on Image Processing, vol. 29, pp. 883–896, 2020.
  19. Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition,” in European Conference on Computer Vision.   Springer International Publishing, 2016, pp. 499–515.
  20. B. Kulis, “Metric learning: A survey,” Foundations and trends in machine learning, vol. 5, no. 4, pp. 287–364, 2013.
  21. S. Ji, Z. Zhang, S. Ying, L. Wang, X. Zhao, and Y. Gao, “Kullback-leibler divergence metric learning,” IEEE Transactions on Cybernetics, vol. 52, no. 4, pp. 2047–2058, 2022.
  22. M. G. Schultz and T. Joachims, “Learning a Distance Metric from Relative Comparisons,” Neural Information Processing Systems, vol. 16, pp. 41–48, 12 2003.
  23. M. T. Law, N. Thome, and M. Cord, “Learning a distance metric from relative comparisons between quadruplets of images,” International Journal of Computer Vision, vol. 121, no. 1, pp. 65–94, 2017.
  24. T. Endo and M. Matsumoto, “Aurora image classification with deep metric learning,” Sensors, vol. 22, no. 17, 2022.
  25. D. Wu, S. Li, Z. Zang, and S. Z. Li, “Exploring localization for self-supervised fine-grained contrastive learning,” in British Machine Vision Conference, 2022.
  26. E. Xie, J. Ding, W. Wang, X. Zhan, H. Xu, P. Sun, Z. Li, and P. Luo, “Detco: Unsupervised contrastive learning for object detection,” in IEEE International Conference on Computer Vision, 2021, pp. 8372–8381.
  27. F. Haghighi, M. R. H. Taher, M. B. Gotway, and J. Liang, “Dira: Discriminative, restorative, and adversarial learning for self-supervised medical image analysis,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 20 792–20 802.
  28. K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 9726–9735.
  29. Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, “Unsupervised feature learning via non-parametric instance discrimination,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3733–3742.
  30. D. Lahat, T. Adali, and C. Jutten, “Multimodal data fusion: An overview of methods, challenges, and prospects,” Proceedings of the IEEE, vol. 103, no. 9, pp. 1449–1477, 2015.
  31. S. Karaoglu, R. Tao, T. Gevers, and A. W. M. Smeulders, “Words matter: Scene text for image classification and retrieval,” IEEE Transactions on Multimedia, vol. 19, no. 5, pp. 1063–1076, 2017.
  32. Y. Su and F. Jurie, “Improving image classification using semantic attributes,” International Journal of Computer Vision, vol. 100, no. 1, pp. 59–77, 2012.
  33. H. Cai, V. W. Zheng, and K. C.-C. Chang, “A comprehensive survey of graph embedding: Problems, techniques, and applications,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 9, pp. 1616–1637, 2018.
  34. L. Fan, X. Sun, and P. L. Rosin, “Siamese graph convolution network for face sketch recognition: An application using graph structure for face photo-sketch recognition,” in International Conference on Pattern Recognition, 2021, pp. 8008–8014.
  35. Y. Wang, Z. Yu, J. Wang, Q. Heng, H. Chen, W. Ye, R. Xie, X. Xie, and S. Zhang, “Exploring vision-language models for imbalanced learning,” International Journal of Computer Vision, vol. 132, no. 1, pp. 224–237, 2024.
  36. M. Yuan, N. Lv, Y. Xie, F. Lu, and K. Zhan, “Clip-fg:selecting discriminative image patches by contrastive language-image pre-training for fine-grained image classification,” in IEEE International Conference on Image Processing, 2023, pp. 560–564.
  37. J. Fu, S. Xu, H. Liu, Y. Liu, N. Xie, C.-C. Wang, J. Liu, Y. Sun, and B. Wang, “Cma-clip: Cross-modality attention clip for text-image classification,” in IEEE International Conference on Image Processing, 2022, pp. 2846–2850.
  38. D. Wang and K. Mao, “Learning semantic text features for web text-aided image classification,” IEEE Transactions on Multimedia, vol. 21, no. 12, pp. 2985–2996, 2019.
  39. G. Waltner, M. Opitz, H. Possegger, and H. Bischof, “Hibster: Hierarchical boosted deep metric learning for image retrieval,” in IEEE Winter Conference on Applications of Computer Vision, 2019, pp. 599–608.
  40. S. Kim, B. Jeong, and S. Kwak, “Hier: Metric learning beyond class labels via hierarchical regularization,” in IEEE Conference on Computer Vision and Pattern Recognition, June 2023, pp. 19 903–19 912.
  41. S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in International Conference on International Conference on Machine Learning, 2015, pp. 448–456.
  42. A. Krizhevsky, “Learning multiple layers of features from tiny images,” Tech. Rep., 2009.
  43. O. Vinyals, C. Blundell, T. Lillicrap, k. kavukcuoglu, and D. Wierstra, “Matching networks for one shot learning,” in Advances in Neural Information Processing Systems, vol. 29, 2016.
  44. G. A. Miller, “Wordnet: a lexical database for english,” Communications of The ACM, vol. 38, no. 11, pp. 39–41, 11 1995.
  45. S. Zagoruyko and N. Komodakis, “Wide residual networks,” CoRR, vol. abs/1605.07146, 2016.
  46. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 1 2014.
  47. I. Loshchilov and F. Hutter, “Fixing weight decay regularization in adam,” CoRR, vol. abs/1711.05101, 2017.
  48. P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,” in International Conference on Learning Representations, 2021.
  49. J. Choe, S. J. Oh, S. Chun, S. Lee, Z. Akata, and H. Shim, “Evaluation for weakly supervised object localization: Protocol, metrics, and datasets,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 1732–1748, 2023.
  50. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in IEEE International Conference on Computer Vision, 2017, pp. 618–626.

Summary

We haven't generated a summary for this paper yet.