Optimize Text Degree Distributions for Concept Learning
Optimize the degree distribution of text nodes in the concept–text bipartite graph—reflecting architectural design choices—to maximize the expected number of concepts learned under a fixed compute budget.
Sponsor
References
There are some open questions and considerations worth exploring. Evidently, the degree distribution of texts is related to the model's architecture. Therefore, optimizing the degree distribution enables a LLM to learn more concepts from text pieces.
— An Information Theory of Compute-Optimal Size Scaling, Emergence, and Plateaus in Language Models
(2410.01243 - Nayak et al., 2 Oct 2024) in Conclusion, final paragraph