Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation (2001.05139v1)

Published 15 Jan 2020 in cs.CL

Abstract: Story generation, namely generating a reasonable story from a leading context, is an important but challenging task. In spite of the success in modeling fluency and local coherence, existing neural language generation models (e.g., GPT-2) still suffer from repetition, logic conflicts, and lack of long-range coherence in generated stories. We conjecture that this is because of the difficulty of associating relevant commonsense knowledge, understanding the causal relationships, and planning entities and events with proper temporal order. In this paper, we devise a knowledge-enhanced pretraining model for commonsense story generation. We propose to utilize commonsense knowledge from external knowledge bases to generate reasonable stories. To further capture the causal and temporal dependencies between the sentences in a reasonable story, we employ multi-task learning which combines a discriminative objective to distinguish true and fake stories during fine-tuning. Automatic and manual evaluation shows that our model can generate more reasonable stories than state-of-the-art baselines, particularly in terms of logic and global coherence.

A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

The paper presents a paper on story generation, specifically focused on enhancing the generative capabilities of neural models by leveraging external commonsense knowledge. Despite advancements in LLMing, models such as GPT-2 grapple with generating coherent and logical story narratives, often resulting in repetition and logical inconsistencies. This work aims to address these issues by introducing a knowledge-enhanced pretraining approach combined with multi-task learning.

Methodology

The core contribution of this paper is a novel framework that integrates knowledge from external databases like ConceptNet and ATOMIC into a transformer-based LLM, particularly GPT-2. The framework enhances GPT-2 through a two-stage process: post-training on external knowledge and fine-tuning with multi-task learning.

  1. Knowledge Incorporation: The model is post-trained using transformed sentences constructed from knowledge graph triples. This process allows the model to internalize commonsense knowledge without explicitly modifying its core architecture.
  2. Multi-Task Learning: To capture causal and temporal dependencies within story contexts, the model employs an auxiliary classification task. This task involves distinguishing between true and artificially introduced fake stories, supporting the model in identifying logical coherence and ordering.

Experimental Results

The authors present an extensive evaluation on the ROCStories dataset, showing that their model outperforms several state-of-the-art baselines including hierarchical and planning-based strategy models. Key metrics include perplexity, BLEU scores, and measures of logical and grammatical coherence, both through automatic measures and manual evaluations.

  • Perplexity and BLEU: The proposed model shows superior textual fluency and closer n-gram alignment to human-written stories as compared to baselines.
  • Coherence and Logicality: Manual evaluations reveal that the model generates stories with improved logical progression and reduced conflicts, supported by its ability to incorporate structured knowledge.

Implications and Future Directions

This paper contributes significantly to the field of open-ended text generation by demonstrating the impact of integrating external knowledge into deep LLMs. The use of external commonsense sources effectively alleviates some inherent blind spots in large-scale pretrained models, like the lack of long-range coherence in story generation.

For future work, exploring the seamless integration of real-time commonsense knowledge with ongoing story generation processes could further enhance the adaptability and context-awareness of generative models. Furthermore, the approach of integrating knowledge-intensive tasks at the pretraining stage itself could be extended to other domains where background knowledge is pivotal, such as dialogue systems or domain-specific narrative generation.

Conclusion

The paper makes an important step forward in commonsense story generation by demonstrating how a knowledge-enhanced pretraining approach can significantly improve the coherence and logical soundness of generated narratives. By employing a multi-task learning framework, the model also effectively captures and applies causal and temporal dependencies in storytelling, setting a new benchmark for LLMs in open-ended narrative tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jian Guan (65 papers)
  2. Fei Huang (408 papers)
  3. Zhihao Zhao (19 papers)
  4. Xiaoyan Zhu (54 papers)
  5. Minlie Huang (225 papers)
Citations (230)