Performance of non-causal patch boundaries when training from scratch
Determine how employing non-causal patch boundary prediction in Latent Tokenizer Language Models (such as the Bolmo architecture) performs when models are trained from scratch, and establish whether the increased expressivity of the boundary predictor is generally beneficial compared to causal boundary prediction across standard language modeling and downstream tasks.
Sponsor
References
For example, we have not assessed how non-causal patch boundaries perform when training from scratch. We expect that the increased expressivity of the boundary predictor might be generally useful, but we do not yet know.
— Bolmo: Byteifying the Next Generation of Language Models
(2512.15586 - Minixhofer et al., 17 Dec 2025) in Section 7 (Future Directions), Bit 0