Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Graph Attention Transformer Network for Multi-Label Image Classification (2203.04049v2)

Published 8 Mar 2022 in cs.CV

Abstract: Multi-label classification aims to recognize multiple objects or attributes from images. However, it is challenging to learn from proper label graphs to effectively characterize such inter-label correlations or dependencies. Current methods often use the co-occurrence probability of labels based on the training set as the adjacency matrix to model this correlation, which is greatly limited by the dataset and affects the model's generalization ability. In this paper, we propose a Graph Attention Transformer Network (GATN), a general framework for multi-label image classification that can effectively mine complex inter-label relationships. First, we use the cosine similarity based on the label word embedding as the initial correlation matrix, which can represent rich semantic information. Subsequently, we design the graph attention transformer layer to transfer this adjacency matrix to adapt to the current domain. Our extensive experiments have demonstrated that our proposed methods can achieve state-of-the-art performance on three datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Neural Machine Translation by Jointly Learning to Align and Translate. In ICLR 2015 : International Conference on Learning Representations 2015.
  2. Spectral Networks and Locally Connected Networks on Graphs. In ICLR 2014 : International Conference on Learning Representations (ICLR) 2014.
  3. Generative Pretraining From Pixels. In ICML 2020: 37th International Conference on Machine Learning, Vol. 1. 1691–1703.
  4. Label Distribution Learning on Auxiliary Label Space Graphs for Facial Expression Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13984–13993.
  5. Order-Free RNN with Visual Attention for Multi-Label Classification. In AAAI. 6714–6721.
  6. Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020), 1–1.
  7. Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition. In AAAI. 6730–6737.
  8. Multi-Label Recognition With Graph Convolutional Networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5177–5186.
  9. NUS-WIDE: a real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 48.
  10. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.
  11. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR 2021: The Ninth International Conference on Learning Representations.
  12. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision 88, 2 (2010), 303–338.
  13. Deep Label Distribution Learning With Label Ambiguity. IEEE Transactions on Image Processing 26, 6 (2017), 2825–2838.
  14. Chest X-rays Classification: A Multi-Label and Fine-Grained Problem. arXiv preprint arXiv:1807.07247 (2018).
  15. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Vol. 2016. 855–864.
  16. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778.
  17. Thomas N. Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR (Poster).
  18. General Multi-label Image Classification with Transformers. arXiv preprint arXiv:2011.14027 (2020).
  19. Multi-label Zero-Shot Learning with Structured Knowledge Graphs. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1576–1585.
  20. Attentive Recurrent Neural Network for Weak-supervised Multi-label Image Classification. In Proceedings of the 26th ACM international conference on Multimedia. 1092–1100.
  21. Learning Category Correlations for Multi-label Image Recognition with Graph Networks. arXiv preprint arXiv:1909.13005 (2019).
  22. Human Attribute Recognition by Deep Hierarchical Contexts. In European Conference on Computer Vision. 684–700.
  23. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision. 740–755.
  24. Semantic Regularisation for Recurrent Image Annotation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4160–4168.
  25. Weiwei Liu and Ivor W. Tsang. 2015. On the optimality of classifier chain for multi-label classification. In NIPS’15 Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, Vol. 28. 712–720.
  26. A survey of visual transformers. arXiv preprint arXiv:2111.06091 (2021).
  27. Learning Context-dependent Label Permutations for Multi-label Classification. In International Conference on Machine Learning. 4733–4742.
  28. Single- and multi-label classification of construction objects using deep transfer learning methods. Journal of Information Technology in Construction 24, 28 (2019), 511–526.
  29. Modular Graph Transformer Networks for Multi-Label Image Classification. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI ’21). AAAI.
  30. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532–1543.
  31. DeepWalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701–710.
  32. Multilabel classifiers with a probabilistic thresholding strategy. Pattern Recognition 45, 2 (2012), 876–883.
  33. Classifier chains for multi-label classification. Machine Learning 85, 3 (2011), 333–359.
  34. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR 2015 : International Conference on Learning Representations 2015.
  35. LINE: Large-scale Information Network Embedding. In Proceedings of the 24th International Conference on World Wide Web. 1067–1077.
  36. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Vol. 30. 5998–6008.
  37. Graph Attention Networks. In International Conference on Learning Representations.
  38. Privacy-Preserving Visual Content Tagging using Graph Transformer Networks. In Proceedings of the 28th ACM International Conference on Multimedia. 2299–2307.
  39. CNN-RNN: A Unified Framework for Multi-label Image Classification. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2285–2294.
  40. Multi-Label Classification with Label Graph Superimposing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12265–12272.
  41. Multi-label Image Recognition by Recurrently Discovering Attentional Regions. In 2017 IEEE International Conference on Computer Vision (ICCV). 464–472.
  42. HCP: A Flexible CNN Framework for Multi-Label Image Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 9 (2016), 1901–1907.
  43. Aggregated Residual Transformations for Deep Neural Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5987–5995.
  44. Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546 (2019).
  45. Exploit Bounding Box Annotations for Multi-Label Object Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 280–288.
  46. Orderless Recurrent Models for Multi-Label Classification. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13440–13449.
  47. Graph Transformer Networks. In 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019, Vol. 32. 11960–11970.
  48. Deep Region and Multi-label Learning for Facial Action Unit Detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3391–3399.
  49. Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2027–2036.
  50. Multi-label learning based deep transfer neural network for facial attribute classification. Pattern Recognition 80 (2018), 225–240.
Citations (26)

Summary

We haven't generated a summary for this paper yet.