Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bayesian WeakS-to-Strong from Text Classification to Generation (2406.03199v2)

Published 24 May 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Advances in LLMs raise the question of how alignment techniques will adapt as models become increasingly complex and humans will only be able to supervise them weakly. Weak-to-Strong mimics such a scenario where weak model supervision attempts to harness the full capabilities of a much stronger model. This work extends Weak-to-Strong to WeakS-to-Strong by exploring an ensemble of weak models which simulate the variability in human opinions. Confidence scores are estimated using a Bayesian approach to guide the WeakS-to-Strong generalization. Furthermore, we extend the application of WeakS-to-Strong from text classification tasks to text generation tasks where more advanced strategies are investigated for supervision. Moreover, direct preference optimization is applied to advance the student model's preference learning, beyond the basic learning framework of teacher forcing. Results demonstrate the effectiveness of the proposed approach for the reliability of a strong student model, showing potential for superalignment.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ziyun Cui (5 papers)
  2. Ziyang Zhang (69 papers)
  3. Wen Wu (103 papers)
  4. Guangzhi Sun (51 papers)
  5. Chao Zhang (907 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets