Multi-Label Image Recognition with Graph Convolutional Networks (1904.03582v1)

Published 7 Apr 2019 in cs.CV and cs.LG

Abstract: The task of multi-label image recognition is to predict a set of object labels that present in an image. As objects normally co-occur in an image, it is desirable to model the label dependencies to improve the recognition performance. To capture and explore such important dependencies, we propose a multi-label classification model based on Graph Convolutional Network (GCN). The model builds a directed graph over the object labels, where each node (label) is represented by word embeddings of a label, and GCN is learned to map this label graph into a set of inter-dependent object classifiers. These classifiers are applied to the image descriptors extracted by another sub-net, enabling the whole network to be end-to-end trainable. Furthermore, we propose a novel re-weighted scheme to create an effective label correlation matrix to guide information propagation among the nodes in GCN. Experiments on two multi-label image recognition datasets show that our approach obviously outperforms other existing state-of-the-art methods. In addition, visualization analyses reveal that the classifiers learned by our model maintain meaningful semantic topology.

Authors (4)

Zhao-Min Chen (4 papers)
Xiu-Shen Wei (40 papers)
Peng Wang (832 papers)
Yanwen Guo (41 papers)

Citations (928)

View on Semantic Scholar

Summary

Multi-Label Image Recognition with Graph Convolutional Networks

The paper presents a multi-label classification model tailored for image recognition, leveraging the structures and properties of Graph Convolutional Networks (GCNs). The goal of multi-label image recognition is to predict a set of object labels present in an image. Considering that objects often co-occur within images, modeling label dependencies can significantly enhance recognition performance. This research proposes a novel approach to capture and explore these label dependencies using GCNs.

Key Contributions

This paper introduces several pivotal advancements in multi-label image recognition:

GCN-Based Classification Model: The central innovation is the introduction of a GCN-based framework, termed ML-GCN. This model treats the classification problem via a directed graph constructed over object labels. Each node in this directed graph corresponds to a label, represented by its word embedding. GCNs are employed to map this label graph into a set of inter-dependent object classifiers. These classifiers are subsequently applied to image descriptors, ensuring the entire network remains end-to-end trainable.
Novel Re-weighted Scheme: The authors propose a novel re-weighted scheme to construct an effective label correlation matrix. This matrix guides the information propagation among nodes within the GCN, balancing the feature update between nodes and their neighborhoods. This re-weighted scheme aims to circumvent overfitting and over-smoothing, common issues in GCN applications.
Empirical Validation: The robustness and efficacy of the proposed method are validated through extensive experiments on two benchmark multi-label image recognition datasets: MS-COCO and VOC 2007. The ML-GCN evidently outperforms existing state-of-the-art methods across various metrics, demonstrating its superior performance.

Methodology

The methodology employed in this research encompasses several critical steps:

Graph Construction: The paper constructs a directed graph over the object labels using their word embeddings. Label dependencies are modeled using conditional probabilities derived from the co-occurrence patterns in the training data. The resultant adjacency matrix is initially binary and subsequently re-weighted.
GCN Layers: The model incorporates stacked GCN layers to propagate node features across the graph. Here, the graph convolution operation is employed to update node representations by aggregating information from adjacent nodes.
Image Representation Learning: For extracting image features, the model utilizes deep convolutional neural networks, specifically ResNet-101, pre-trained on ImageNet. Global max-pooling is applied to derive image-level descriptors which are then processed by the inter-dependent object classifiers.

Results and Analysis

The experimental results reveal that the ML-GCN substantially improves multi-label image recognition performance:

MS-COCO Dataset: Observations indicate that ML-GCN achieves a mean Average Precision (mAP) of 83.0%, outperforming the previous best model by a significant margin. Metrics such as average per-class Precision (CP) and Recall (CR) also show marked improvements.
VOC 2007 Dataset: The model attains a mAP of 94.0%, again surpassing existing approaches. This demonstrates the generalizability and robustness of the proposed method across different datasets.
Visualization: The visualization of learned classifiers using t-SNE validates that the classifiers exhibit meaningful semantic topology. Classifiers for semantically related concepts are positioned close, enhancing the interpretability of the learned model.

Implications and Future Directions

The implications of this research are profound both in practical and theoretical domains:

Practical Applications: Enhancements in multi-label image recognition can directly benefit various applications such as autonomous driving, medical image analysis, and retail checkout systems. The ML-GCN framework, with its end-to-end trainability and attention to label dependencies, offers a versatile model for real-world applications.
Theoretical Insights: This work expands the understanding of GCN applications beyond traditional tasks. By introducing a re-weighted scheme, it sets a precedent for handling over-smoothing in graph-based models.

Looking ahead, there are several promising directions for future research:

Model Scalability: Investigating scalable GCN architectures that can handle larger and more complex datasets efficiently.
Dynamic Label Graphs: Developing methods where the label graph can dynamically adapt based on context, potentially improving performance in diverse scenarios.
Integration with Transformer Models: Exploring synergies between GCNs and transformer models could lead to even more effective representations, leveraging the strengths of both approaches.

In conclusion, this research successfully demonstrates the efficacy of GCNs in multi-label image recognition, establishing new benchmarks and elucidating pathways for future advancements in the field.

PDF Markdown