Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation (2401.07382v2)

Published 14 Jan 2024 in cs.CL and cs.AI

Abstract: Reinforcement learning (RL) can align LLMs with non-differentiable reward signals, such as human preferences. However, a major challenge arises from the sparsity of these reward signals - typically, there is only a single reward for an entire output. This sparsity of rewards can lead to inefficient and unstable learning. To address this challenge, our paper introduces an novel framework that utilizes the critique capability of LLMs to produce intermediate-step rewards during RL training. Our method involves coupling a policy model with a critic LLM, which is responsible for providing comprehensive feedback of each part of the output. This feedback is then translated into token or span-level rewards that can be used to guide the RL training process. We investigate this approach under two different settings: one where the policy model is smaller and is paired with a more powerful critic model, and another where a single LLM fulfills both roles. We assess our approach on three text generation tasks: sentiment control, LLM detoxification, and summarization. Experimental results show that incorporating artificial intrinsic rewards significantly improve both sample efficiency and the overall performance of the policy model, supported by both automatic and human evaluation.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (77)

Authors (7)

Meng Cao (107 papers)
Lei Shu (82 papers)
Lei Yu (234 papers)
Yun Zhu (52 papers)
Nevan Wichers (11 papers)
Yinxiao Liu (8 papers)
Lei Meng (54 papers)

Citations (1)

View on Semantic Scholar

Tweets

https://twitter.com/JagersbergKnut/status/1778431085027688467

Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation (2401.07382v2)

Related Papers

Tweets