Conjecture on the source of DecLLM’s superior pretraining performance
Establish whether the superior pretraining performance of decoder-only language models trained with a causal language modeling objective primarily arises from alignment between the causal LM training objective and downstream evaluation protocols rather than from intrinsically greater modeling capability.
References
We conjecture that the superior pretraining performance of DecLLM is mostly caused by the higher degree of matching between its pretraining objective and the downstream evaluation, rather than its stronger capability.
— Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model
(2510.26622 - Zhang et al., 30 Oct 2025) in Section 6, subsection "RedLLM shows high adaptability: matching and even surpassing DecLLM across scales after finetuning"