Characterize factors governing the effectiveness and stability of embedding scaling
Determine how the total parameter budget allocated to embedding parameters, the n-gram vocabulary size, the initialization schemes for embedding tables and projection matrices, and the trade-offs between transformer model width and depth jointly influence the effectiveness and training stability of scaling embedding parameters in large language models (e.g., via N-gram Embedding).
References
Second, the constraints of scaling embeddings are still not systematically characterized: it remains unclear how factors such as the total parameter budget, vocabulary size, initialization schemes, and the trade-offs between model width and depth jointly influence the effectiveness and stability of embedding scaling.
— Scaling Embeddings Outperforms Scaling Experts in Language Models
(2601.21204 - Liu et al., 29 Jan 2026) in Section 1 Introduction