Nominal Metaphor Generation with Multitask Learning (2206.05195v3)

Published 10 Jun 2022 in cs.CL

Abstract: Metaphor generation is a challenging task which can impact many downstream tasks such as improving user satisfaction with dialogue systems and story generation. This paper tackles the problem of Chinese nominal metaphor generation by introducing a multitask metaphor generation framework with self-training and metaphor identification mechanisms. Self-training addresses the data scarcity issue of metaphor datasets. That is, instead of solely relying on labelled metaphor datasets which are usually small in size, self-training helps identify potential metaphors from a large-scale unlabelled corpus for metaphor generation. The metaphor weighting mechanism enables our model to focus on the metaphor-related parts of the input (e.g., the comparison of the metaphor and comparator) during model learning and thus improves the metaphoricity of the generated metaphors. Our model is trained on an annotated corpus consisting of 6.3k sentences that contain diverse metaphorical expressions. Experimental results show that our model is able to generate metaphors with better readability and creativity compared to the baseline models, even in the situation where training data is insufficient.

PDF Abstract

The paper "Nominal Metaphor Generation with Multitask Learning" presents an innovative approach to the automatic generation of Chinese nominal metaphors through a framework that employs multitask learning and addresses key challenges related to data scarcity and model inefficacy in current metaphor generation methodologies. Metaphor generation can significantly enhance various natural language generation (NLG) downstream tasks, such as dialogue systems and story generation, by making language more engaging and vivid.

Key Contributions and Methodology

Metaphor Generation Framework: The proposed framework integrates self-training and metaphor identification mechanisms. These techniques aim to overcome the limitations imposed by limited labeled metaphor datasets. Self-training leverages large-scale unlabelled data to improve metaphor generation, whereas metaphor weighting focuses model training on metaphor-relevant segments of the input.
Self-Training Mechanism: The self-training approach involves:
- Training a teacher model on a small labeled dataset.
- Using this model to identify potential metaphors in unlabelled data.
- Training a student model on a combination of labeled and newly identified metaphor data.

This iterative process allows the model to enhance its metaphor generation capabilities by utilizing additional unlabelled corpora.

Metaphor Weighting: The metaphor weighting mechanism assigns higher importance to the metaphorical components of the input during training. This approach is facilitated by metaphor identification, which detects metaphor-relevant parts of a sentence and emphasizes them without requiring extensive labeled data.
Corpora Development: To facilitate the training and evaluation of the proposed model, the researchers developed two corpora:
- Chinese Metaphor Corpus (CMC): This corpus includes 2.7k metaphorical and 3.5k literal examples for training.
- Chinese Literature Corpus (CLC): A large-scale unlabelled Chinese literature corpus is used for the self-training component.

Experimental Evaluation

Experiments were conducted using both automatic metrics and human evaluations. The proposed model was compared with baseline approaches, including LSTM-generative models, SeqGAN models, GPT2, BART, and a state-of-the-art English simile generator SCOPE. The key findings from the experimental analysis included:

Automatic Metrics: The proposed multitask framework demonstrated superior performance in terms of fluency (perplexity), diversity (Dist-1 and Dist-2 scores), and, notably, metaphoricity of the generated outputs.
Human Evaluation: Human assessments also showed that metaphors generated by the proposed method were rated higher in consistency and creativity. The framework was able to produce metaphors with more poetic richness and contextual coherence.

Conclusions

The paper presents a robust neural metaphor generation system capable of creating nominal metaphors using minimal labeled data due to its efficient self-training and metaphor weighting techniques. This method represents a significant advancement in metaphor generation, particularly in the context of Chinese language processing, where the richness of metaphoric expressions plays a critical role. Future work is suggested to explore richer linguistic constructs and extend the framework to other languages, potentially enhancing the applicability and versatility of metaphor generation systems in NLG tasks.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Yucheng Li (31 papers)
Chenghua Lin (127 papers)
Frank Geurin (1 paper)

Citations (13)

View on Semantic Scholar

Nominal Metaphor Generation with Multitask Learning (2206.05195v3)

Key Contributions and Methodology

Experimental Evaluation

Conclusions

Related Papers