Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-objective Reinforcement learning from AI Feedback (2406.07295v2)

Published 11 Jun 2024 in cs.LG

Abstract: This paper presents Multi-Objective Reinforcement Learning from AI Feedback (MORLAIF), a novel approach to improving the alignment and performance of LLMs trained using reinforcement learning from AI feedback (RLAIF). In contrast to standard approaches that train a single preference model to represent all human preferences, MORLAIF decomposes this task into multiple simpler principles, such as toxicity, factuality, and sycophancy. Separate preference models are trained for each principle using feedback from GPT-3.5-Turbo. These preference model scores are then combined using different scalarization functions to provide a reward signal for Proximal Policy Optimization (PPO) training of the target LLM. Our experiments indicate that MORLAIF outperforms the standard RLAIF baselines and that MORLAIF can be used to align larger LLMs using smaller ones. Surprisingly, the choice of scalarization function does not appear to significantly impact the results.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Marcus Williams (3 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com