Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unsupervised Large Language Model Alignment for Information Retrieval via Contrastive Feedback (2309.17078v2)

Published 29 Sep 2023 in cs.IR

Abstract: LLMs have demonstrated remarkable capabilities across various research domains, including the field of Information Retrieval (IR). However, the responses generated by off-the-shelf LLMs tend to be generic, i.e., cannot capture the distinctiveness of each document with similar content. This limits the performance of LLMs in IR because finding and distinguishing relevant documents from substantial similar documents is a typical problem in many IR tasks. To address this issue, we propose an unsupervised alignment method, namely Reinforcement Learning from Contrastive Feedback (RLCF), empowering LLMs to generate both high-quality and context-specific responses. Our approach constructs unsupervised contrastive feedback signals based on similar document groups, and adopts a reward function, named group-wise reciprocal rank, to optimize LLMs within a standard Proximal Policy Optimization. We conduct extensive experiments to evaluate the effectiveness of RLCF on LLMs built with different languages and parameter sizes on multiple downstream IR applications. RLCF significantly outperforms existing alignment methods, and RLCF-optimized LLMs demonstrate considerable improvement in generating responses with distinctiveness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Qian Dong (25 papers)
  2. Yiding Liu (30 papers)
  3. Qingyao Ai (113 papers)
  4. Zhijing Wu (21 papers)
  5. Haitao Li (65 papers)
  6. Yiqun Liu (131 papers)
  7. Shuaiqiang Wang (68 papers)
  8. Dawei Yin (165 papers)
  9. Shaoping Ma (39 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.