Aligning Language Models with Offline Learning from Human Feedback (2308.12050v2)

Published 23 Aug 2023 in cs.CL and cs.AI

Abstract: Learning from human preferences is crucial for LLMs (LMs) to effectively cater to human needs and societal values. Previous research has made notable progress by leveraging human feedback to follow instructions. However, these approaches rely primarily on online learning techniques like Proximal Policy Optimization (PPO), which have been proven unstable and challenging to tune for LLMs. Moreover, PPO requires complex distributed system implementation, hindering the efficiency of large-scale distributed training. In this study, we propose an offline learning from human feedback framework to align LMs without interacting with environments. Specifically, we explore filtering alignment (FA), reward-weighted regression (RWR), and conditional alignment (CA) to align LLMs to human preferences. By employing a loss function similar to supervised fine-tuning, our methods ensure more stable model training than PPO with a simple machine learning system~(MLSys) and much fewer (around 9\%) computing resources. Experimental results demonstrate that conditional alignment outperforms other offline alignment methods and is comparable to PPO.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (30)

Authors (4)

Jian Hu (40 papers)
Li Tao (27 papers)
June Yang (3 papers)
Chandler Zhou (2 papers)

Citations (5)

View on Semantic Scholar

Aligning Language Models with Offline Learning from Human Feedback (2308.12050v2)

Related Papers