Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning (2210.07792v2)

Published 14 Oct 2022 in cs.CL

Abstract: Controlled automated story generation seeks to generate natural language stories satisfying constraints from natural language critiques or preferences. Existing methods to control for story preference utilize prompt engineering which is labor intensive and often inconsistent. They may also use logit-manipulation methods which require annotated datasets to exist for the desired attributes. To address these issues, we first train a contrastive bi-encoder model to align stories with corresponding human critiques, named CARP, building a general purpose preference model. This is subsequently used as a reward function to fine-tune a generative LLM via reinforcement learning. However, simply fine-tuning a generative LLM with a contrastive reward model does not always reliably result in a story generation system capable of generating stories that meet user preferences. To increase story generation robustness we further fine-tune the contrastive reward model using a prompt-learning technique. A human participant study is then conducted comparing generations from our full system, ablations, and two baselines. We show that the full fine-tuning pipeline results in a story generator preferred over a LLM 20x as large as well as logit-based methods. This motivates the use of contrastive learning for general purpose human preference modeling.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Louis Castricato (16 papers)
  2. Alexander Havrilla (2 papers)
  3. Shahbuland Matiana (4 papers)
  4. Michael Pieler (10 papers)
  5. Anbang Ye (4 papers)
  6. Ian Yang (7 papers)
  7. Spencer Frazier (11 papers)
  8. Mark Riedl (51 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.