Graph Convolutional Networks for Text Classification (1809.05679v3)

Published 15 Sep 2018 in cs.CL and cs.AI

Abstract: Text classification is an important and classical problem in natural language processing. There have been a number of studies that applied convolutional neural networks (convolution on regular grid, e.g., sequence) to classification. However, only a limited number of studies have explored the more flexible graph convolutional neural networks (convolution on non-grid, e.g., arbitrary graph) for the task. In this work, we propose to use graph convolutional networks for text classification. We build a single text graph for a corpus based on word co-occurrence and document word relations, then learn a Text Graph Convolutional Network (Text GCN) for the corpus. Our Text GCN is initialized with one-hot representation for word and document, it then jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents. Our experimental results on multiple benchmark datasets demonstrate that a vanilla Text GCN without any external word embeddings or knowledge outperforms state-of-the-art methods for text classification. On the other hand, Text GCN also learns predictive word and document embeddings. In addition, experimental results show that the improvement of Text GCN over state-of-the-art comparison methods become more prominent as we lower the percentage of training data, suggesting the robustness of Text GCN to less training data in text classification.

Authors (3)

Liang Yao (29 papers)
Chengsheng Mao (25 papers)
Yuan Luo (127 papers)

Citations (1,700)

View on Semantic Scholar

Summary

Graph Convolutional Networks for Text Classification

The paper "Graph Convolutional Networks for Text Classification" by Liang Yao, Chengsheng Mao, and Yuan Luo presents an innovative approach to tackle the text classification problem by leveraging graph convolutional networks (GCNs). This paper distinguishes itself by modeling an entire corpus as a heterogeneous graph with words and documents as nodes, and it subsequently transforms the text classification challenge into a node classification problem using graph neural networks.

In traditional text classification, text representation has often relied on hand-crafted features such as bag-of-words or n-grams. More recently, deep learning models like convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been applied, making significant strides by efficiently learning textual features directly from the data. However, these models primarily capture local sequential information and may fall short in leveraging global word co-occurrence patterns that span across the entire corpus.

Graph neural networks (GNNs) have the potential to overcome these limitations by preserving global structure information and effectively capturing relational data inherent in graphs. Their ability to generalize well-established models like CNNs to non-grid structures makes them particularly suitable for this task. The proposed method, termed Text Graph Convolutional Network (Text GCN), demonstrates the utility of GCNs for text classification in a novel way.

Methodology

The authors construct a large graph from the entire corpus, where nodes represent both words and documents. An edge between word nodes is established based on pointwise mutual information (PMI) derived from word co-occurrence statistics within a fixed-size sliding window across the corpus. Similarly, edges between word nodes and document nodes use term frequency-inverse document frequency (TF-IDF) scores to capture the relationship between words and documents. This approach ensures the preservation of global co-occurrence information.

The Text GCN architecture comprises a two-layer GCN where the second-layer node embeddings, corresponding to various documents and words, are used for classification. A cross-entropy loss function over labeled documents guides the training, enabling the model to simultaneously learn embeddings that are both predictive and interpretable.

Experimental Evaluation

The experimental results underscore the competitiveness of Text GCN against a suite of baseline models on five benchmark datasets: 20-Newsgroups, R8, R52, Ohsumed, and Movie Reviews. Notably, the model surpasses alternative state-of-the-art methods in most settings without relying on external word embeddings or pre-trained models.

On datasets like 20-Newsgroups, R8, R52, and Ohsumed, Text GCN achieves a test accuracy of 86.34%, 97.07%, 93.56%, and 68.36% respectively. These results highlight the efficiency of Text GCN in scenarios where global word co-occurrence information is crucial. Moreover, the robustness of the model in settings with limited training data is particularly noteworthy, suggesting potential applications in low-resource environments.

Implications and Future Directions

The superiority of Text GCN in leveraging global co-occurrence statistics can significantly influence future research in text classification, especially in domains requiring extensive contextual understanding. Practically, this method can be applied in areas like large-scale document organization, spam detection, and sentiment analysis.

Despite its strengths, the current transductive nature of GCNs poses a limitation for handling unseen data. Hence, future work might focus on adapting Text GCN to inductive settings, where models can generalize to new, unseen documents efficiently. Enhancements could also include integrating attention mechanisms to improve performance by allowing the model to focus on more salient parts of the text graph selectively.

In conclusion, the paper presents a robust methodological advancement in text classification by harnessing the power of Graph Neural Networks. Future research, leveraging inductive learning and exploring unsupervised frameworks, can further expand the utility and applicability of Text GCN in broader NLP and AI contexts.

PDF Markdown