RL versus SFT for alignment
Ascertain whether reinforcement learning is more suitable than supervised fine-tuning for aligning Large Language Models—even when supervised fine-tuning uses high‑quality demonstrations—and characterize their fundamental differences in how they shape model behavior.
Sponsor
References
A core open question addresses whether RL is more suitable for alignment than SFT, even when the latter is supplied with high-quality demonstrations, and how these two paradigms fundamentally differ in shaping model behavior.
— Beyond the Black Box: Theory and Mechanism of Large Language Models
(2601.02907 - Gan et al., 6 Jan 2026) in Subsubsection Relationship between Training and Alignment, Section 5: Alignment Stage (Advanced Topics and Open Questions)