Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hierarchical Neural Story Generation (1805.04833v1)

Published 13 May 2018 in cs.CL

Abstract: We explore story generation: creative systems that can build coherent and fluent passages of text about a topic. We collect a large dataset of 300K human-written stories paired with writing prompts from an online forum. Our dataset enables hierarchical story generation, where the model first generates a premise, and then transforms it into a passage of text. We gain further improvements with a novel form of model fusion that improves the relevance of the story to the prompt, and adding a new gated multi-scale self-attention mechanism to model long-range context. Experiments show large improvements over strong baselines on both automated and human evaluations. Human judges prefer stories generated by our approach to those from a strong non-hierarchical model by a factor of two to one.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Angela Fan (49 papers)
  2. Mike Lewis (78 papers)
  3. Yann Dauphin (24 papers)
Citations (1,481)

Summary

Hierarchical Neural Story Generation

In the paper "Hierarchical Neural Story Generation," Angela Fan, Mike Lewis, and Yann Dauphin present a robust approach for advancing automatic story generation. This research tackles the multifaceted challenges of generating long, coherent, and contextually relevant stories by leveraging hierarchical generation models. The authors introduced new data collection methodologies, novel model architectures, and sophisticated training techniques, substantially improving the quality of generated narratives.

Introduction

The complexity of story generation lies within the necessity to maintain thematic coherence across long texts while demonstrating creativity and high-level plot structure. The standard sequence-to-sequence (seq2seq) models, although successful in many text generation tasks, struggle with the open-ended nature of story prompts and the long-range dependencies required for full narrative arcs. This paper addresses these challenges by using a hierarchical model structure where a prompt or premise is initially generated, followed by the story conditioned on this prompt. This approach aims to foster consistency and thematic relevance by grounding the story within an overarching plot premise.

Dataset and Methodology

A significant contribution of this research is the collection and utilization of a large dataset from Reddit’s WritingPrompts forum. The dataset comprises around 300,000 human-written stories paired with writing prompts, providing a rich repository for training and evaluation. The hierarchical nature of the dataset is well suited to the hierarchical generation model proposed by the authors.

Key statistics about the dataset include:

  • 272,600 training stories
  • 15,138 test stories
  • 15,620 validation stories
  • Average story length of 734.5 words

For preprocessing, redundant and irrelevant content was excluded, enhancing the dataset's quality. The tokenization process retained most textual attributes to align closely with consumer-ready LLMing.

Hierarchical Generation Models

The hierarchical model architecture comprises two primary stages:

  1. Prompt Generation: Utilizing a convolutional neural network (CNN) LLM to generate a prompt.
  2. Story Generation: Leveraging a seq2seq model trained to generate a story conditioned on the prompt.

This dual-stage generation ensures high-level planning and structure, addressing the deficiencies of word-by-word generation that standard models impose.

Improvements in Attention Mechanism

The authors enhanced the seq2seq architecture with a novel gated multi-scale self-attention mechanism. This mechanism is pivotal for efficiently modeling long documents, addressing the limitations inherent in recurrent architectures for longer sequences. It enables the model to capture context over extended text by incorporating gated linear units (GLUs) and multi-head attention, allowing it to focus on relevant parts of the text at different resolutions.

Model Fusion Technique

To mitigate the tendency of seq2seq models to ignore prompts and degrade into mere LLMs, the paper introduces a model fusion technique. This method involves training a fusion model on top of a pre-trained seq2seq model, specifically fine-tuning it to focus on the prompt-story dependency. This approach demonstrated substantial improvements in aligning generated stories with their prompts, surpassing an ensemble of two seq2seq models using fewer parameters.

Experimental Results

The experiments conducted with various models reflected significant improvements across both automated and human evaluation metrics. Notably:

  • Perplexity Reduction: Enhanced attention mechanisms and model fusion yielded a perplexity reduction, indicating better fluency and next-word prediction.
  • Human Preference: Human judges preferred stories generated by the hierarchical models about twice as much compared to non-hierarchical baselines.
  • Prompt Consistency: The model’s ability to adhere to prompts improved, as evidenced by better performance in human prompt-story pairing tasks.

Practical and Theoretical Implications

Practically, these advancements in story generation models open venues for more interactive and engaging content creation tools, useful in entertainment, education, and automated customer service. Theoretically, this research underscores the importance of hierarchical structures and attention mechanisms in generating coherent long-form text, setting a benchmark for future work in open-domain text generation.

Future Directions

While the paper makes significant strides, it also opens up several avenues for further exploration:

  • Enhanced Prompt Specificity: Incorporating more diverse and rare word distributions in prompt generation could yield even more creative outputs.
  • Integration with Knowledge Bases: Coupling story generation models with external knowledge bases can enhance the factual accuracy of generated content.
  • Exploration of Reinforcement Learning: Using reinforcement learning to train models based on user engagement metrics might further improve story relevance and appeal.

In summary, "Hierarchical Neural Story Generation" by Fan, Lewis, and Dauphin represents a notable advancement in the field of AI-driven creative writing. By addressing the primary challenges of coherence and prompt relevance through innovative model design and training techniques, this research sets a strong foundation for future developments in narrative generation technology.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com