Few-shot Text Classification with Distributional Signatures (1908.06039v3)

Published 16 Aug 2019 in cs.CL and cs.LG

Abstract: In this paper, we explore meta-learning for few-shot text classification. Meta-learning has shown strong performance in computer vision, where low-level patterns are transferable across learning tasks. However, directly applying this approach to text is challenging--lexical features highly informative for one task may be insignificant for another. Thus, rather than learning solely from words, our model also leverages their distributional signatures, which encode pertinent word occurrence patterns. Our model is trained within a meta-learning framework to map these signatures into attention scores, which are then used to weight the lexical representations of words. We demonstrate that our model consistently outperforms prototypical networks learned on lexical knowledge (Snell et al., 2017) in both few-shot text classification and relation classification by a significant margin across six benchmark datasets (20.0% on average in 1-shot classification).

Citations (161)

View on Semantic Scholar

Summary

The paper proposes a meta-learning framework that uses distributional signatures via an Attention Generator and Ridge Regressor to enhance few-shot text classification in NLP.
The model achieved significant improvements, outperforming baseline methods by an average of 20.0% in 1-shot classification across six benchmark datasets.
This approach demonstrates strong potential for improving generalization and handling limited labeled data in various NLP tasks by effectively transferring learned representations.

Few-shot Text Classification with Distributional Signatures

The paper presents a novel approach to few-shot text classification by leveraging meta-learning techniques, traditionally successful in computer vision, to address challenges unique to NLP. Specifically, the paper proposes a method that utilizes distributional signatures to enhance the transferability of representations across various text classification tasks.

Methodology Overview

The primary challenge in applying meta-learning to text classification lies in the fact that lexical features, which are often pivotal in one text classification task, might be irrelevant in another. To overcome this, the authors introduce a method that emphasizes learning from distributional signatures rather than raw lexical features. These signatures encode word occurrence patterns that help in generalizing across different classes and tasks.

The proposed model functions within a meta-learning framework with two main components:

Attention Generator: This component translates distributional signatures into attention scores that reflect the importance of specific words in a classification task. It combines information from two sources:
- General word importance derived from unigram frequency in a large corpus.
- Class-specific importance estimated using a subset of data from the task-related classes. The biLSTM is employed to integrate these inputs efficiently, and attention scores are produced via dot-product mechanisms.
Ridge Regressor: Using attention scores generated by the first component, this regressor constructs weighted lexical representations of the text and learns quickly from the few examples available in a few-shot setting. Optimizing end-to-end over differentiable operations, it leverages ridge regression for prediction and applies a softmax calibration step to produce classification probabilities.

Numerical Results and Claims

The model was rigorously evaluated across six benchmark datasets, spanning different types of text classification and relation classification tasks. The paper reports strong numerical results, with notable improvements over baseline methods. Particularly, in 1-shot classification, the model outperformed prototypical network baselines by an average margin of 20.0%. This improvement indicates the effectiveness of the distributional signature approach in enhancing few-shot learning capabilities in NLP.

Theoretical Insights

The authors provide theoretical evidence showing the robustness of the attention generator against input perturbations, specifically word substitutions that preserve unigram probabilities. This robustness ensures that important features for one task remain discriminative even when textual inputs are perturbed, thus maintaining the integrity of learned representations.

Implications and Future Directions

This work has significant implications for the field of NLP, particularly in scenarios where labeled data is scarce and task-specific model adaptation is crucial. The approach shows promise for improving generalization in NLP tasks with limited annotations, leveraging distributional statistics to transfer attention across tasks effectively.

Looking forward, future research could explore integrating more sophisticated distributional signature forms, perhaps utilizing context-aware mechanisms or focusing on sub-word level features to handle tasks involving challenging lexical or morphological variations.

Additionally, this work sets the stage for examining the convergence between meta-learning and unsupervised pre-training paradigms, considering how broader distributional and contextual information could further enhance few-shot learning in NLP.

Overall, the paper provides a compelling argument for reconsidering traditional NLP methodologies and exploring meta-learning frameworks driven by distributional signatures.