Papers
Topics
Authors
Recent
2000 character limit reached

A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

Published 15 Jan 2020 in cs.CL | (2001.05139v1)

Abstract: Story generation, namely generating a reasonable story from a leading context, is an important but challenging task. In spite of the success in modeling fluency and local coherence, existing neural language generation models (e.g., GPT-2) still suffer from repetition, logic conflicts, and lack of long-range coherence in generated stories. We conjecture that this is because of the difficulty of associating relevant commonsense knowledge, understanding the causal relationships, and planning entities and events with proper temporal order. In this paper, we devise a knowledge-enhanced pretraining model for commonsense story generation. We propose to utilize commonsense knowledge from external knowledge bases to generate reasonable stories. To further capture the causal and temporal dependencies between the sentences in a reasonable story, we employ multi-task learning which combines a discriminative objective to distinguish true and fake stories during fine-tuning. Automatic and manual evaluation shows that our model can generate more reasonable stories than state-of-the-art baselines, particularly in terms of logic and global coherence.

Citations (230)

Summary

  • The paper presents a novel pretraining framework that integrates external commonsense knowledge with multi-task learning to enhance GPT-2’s story generation capabilities.
  • Methodology involves post-training using knowledge graph triples and an auxiliary classification task to improve narrative coherence and logical progression.
  • Experimental results show improved perplexity, BLEU scores, and coherence over state-of-the-art baselines, highlighting the benefits of structured knowledge integration.

A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

The paper presents a study on story generation, specifically focused on enhancing the generative capabilities of neural models by leveraging external commonsense knowledge. Despite advancements in language modeling, models such as GPT-2 grapple with generating coherent and logical story narratives, often resulting in repetition and logical inconsistencies. This work aims to address these issues by introducing a knowledge-enhanced pretraining approach combined with multi-task learning.

Methodology

The core contribution of this paper is a novel framework that integrates knowledge from external databases like ConceptNet and ATOMIC into a transformer-based LLM, particularly GPT-2. The framework enhances GPT-2 through a two-stage process: post-training on external knowledge and fine-tuning with multi-task learning.

  1. Knowledge Incorporation: The model is post-trained using transformed sentences constructed from knowledge graph triples. This process allows the model to internalize commonsense knowledge without explicitly modifying its core architecture.
  2. Multi-Task Learning: To capture causal and temporal dependencies within story contexts, the model employs an auxiliary classification task. This task involves distinguishing between true and artificially introduced fake stories, supporting the model in identifying logical coherence and ordering.

Experimental Results

The authors present an extensive evaluation on the ROCStories dataset, showing that their model outperforms several state-of-the-art baselines including hierarchical and planning-based strategy models. Key metrics include perplexity, BLEU scores, and measures of logical and grammatical coherence, both through automatic measures and manual evaluations.

  • Perplexity and BLEU: The proposed model shows superior textual fluency and closer n-gram alignment to human-written stories as compared to baselines.
  • Coherence and Logicality: Manual evaluations reveal that the model generates stories with improved logical progression and reduced conflicts, supported by its ability to incorporate structured knowledge.

Implications and Future Directions

This paper contributes significantly to the field of open-ended text generation by demonstrating the impact of integrating external knowledge into deep LLMs. The use of external commonsense sources effectively alleviates some inherent blind spots in large-scale pretrained models, like the lack of long-range coherence in story generation.

For future work, exploring the seamless integration of real-time commonsense knowledge with ongoing story generation processes could further enhance the adaptability and context-awareness of generative models. Furthermore, the approach of integrating knowledge-intensive tasks at the pretraining stage itself could be extended to other domains where background knowledge is pivotal, such as dialogue systems or domain-specific narrative generation.

Conclusion

The paper makes an important step forward in commonsense story generation by demonstrating how a knowledge-enhanced pretraining approach can significantly improve the coherence and logical soundness of generated narratives. By employing a multi-task learning framework, the model also effectively captures and applies causal and temporal dependencies in storytelling, setting a new benchmark for LLMs in open-ended narrative tasks.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.