Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting (2408.13028v1)

Published 23 Aug 2024 in cs.CL

Abstract: In-context learning (ICL) of LLMs has attracted increasing attention in the community where LLMs make predictions only based on instructions augmented with a few examples. Existing example selection methods for ICL utilize sparse or dense retrievers and derive effective performance. However, these methods do not utilize direct feedback of LLM to train the retriever and the examples selected can not necessarily improve the analogy ability of LLM. To tackle this, we propose our policy-based reinforcement learning framework for example selection (RLS), which consists of a LLM (LM) selector and an LLM generator. The LM selector encodes the candidate examples into dense representations and selects the top-k examples into the demonstration for LLM. The outputs of LLM are adopted to compute the reward and policy gradient to optimize the LM selector. We conduct experiments on different datasets and significantly outperform existing example selection methods. Moreover, our approach shows advantages over supervised finetuning (SFT) models in few shot setting. Further experiments show the balance of abundance and the similarity with the test case of examples is important for ICL performance of LLM.

In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting

Overview

The paper by Haowei Du and Dongyan Zhao introduces a policy-based reinforcement learning (RL) framework for example selection in in-context learning (ICL) using LLMs, applied specifically to the task of Incomplete Utterance Rewriting (IUR). The IUR task aims to transform incomplete utterances into self-contained, semantically equivalent utterances that can be understood independently of preceding context.

Methodology

The authors propose a novel method that leverages policy-based reinforcement learning to optimize the selection of examples used for in-context demonstrations. The principal components of their framework are:

  1. LLM Selector: This module encodes candidate examples into dense representations. The sentence embedding is derived using a pre-trained LLM like BERT by concatenating the context and the incomplete utterance with the hidden state of the "[CLS]" token representing the case.
  2. LLM Generator: Upon selection of the top-k examples by the LM selector, these examples are fed into the LLM to generate the rewritten utterance.

Experimental Results

The authors evaluated their framework against several competitive baselines, including sparse retrieval methods like BM25 and dense retrieval methods like KATE, EPR, and BSR. The empirical results were obtained on three benchmark datasets: CANARD (English conversational question answering), TASK (task-oriented English dialogues), and REWRITE (open-domain Chinese dialogues).

Numerical Results

On the CANARD dataset with 5-shot demonstrations, the proposed method outperformed existing selection methods by approximately 1.2-1.7 ROUGE scores, 1.3-2.5 BLEU scores, 0.4 F2 scores, and 0.6 F3 scores. Similarly, substantial performance gains were shown on both the TASK and REWRITE datasets. This robust improvement across multiple datasets illustrates the effectiveness of using RL-based example selection in enhancing the ICL capabilities of LLMs.

Implications and Future Directions

The implications of this research are significant for the improvement of LLM performance in tasks involving incomplete information, such as dialogue systems and conversational agents. By directly incorporating feedback from the LLM into the example selection process, the proposed method enhances the model's analogy capability and overall performance in generating contextually appropriate utterances.

The framework showcases potential areas for future work, including:

  • Scalability: Further testing and refinement with larger datasets and LLMs like GPT-4 or ChatGPT-like systems to push the limits of its scalability and robustness.
  • Transfer Learning: Exploring the cross-application of this method to other NLP tasks such as machine translation and summarization.
  • Efficiency Improvements: Investigating optimized versions of the RL component to reduce computational overhead and enhance real-time adaptability.

Conclusion

The reinforcement learning-based example selection framework presented in this paper demonstrates a substantial advancement in fine-tuning LLM performance for IUR tasks. By showing clear improvements over both sparse and dense retrieval methods, the paper details how feedback-driven learning offers a viable path for enhancing in-context learning capacities of LLMs. This work contributes a nuanced understanding of selecting appropriate examples to maximize LLM efficacy, with implications extending across various applications in natural language processing.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Haowei Du (7 papers)
  2. Dongyan Zhao (144 papers)