Conjecture on SFT data quality causing declines in some benchmarks
Ascertain whether the observed declines on certain benchmarks, such as MMLU, after supervised fine-tuning of LLaDA 8B are due to the suboptimal quality of the supervised fine-tuning dataset.
Sponsor
References
A few metrics, such as MMLU, showed declines, and we conjecture may be due to the suboptimal quality of the SFT data.
— Large Language Diffusion Models
(2502.09992 - Nie et al., 14 Feb 2025) in Benchmark Results, Section 4.2