Papers
Topics
Authors
Recent
Search
2000 character limit reached

Generalizing Scaling Laws for Dense and Sparse Large Language Models

Published 8 Aug 2025 in cs.LG, cs.AI, and cs.PF | (2508.06617v1)

Abstract: Over the past few years, the size of LLMs has grown exponentially, as has the computational cost to train these large models. This rapid growth has motivated researchers to develop new techniques aimed at enhancing the efficiency of the training process. Despite these advancements, optimally predicting the model size or allocating optimal resources remains a challenge. Several efforts have addressed the challenge by proposing different scaling laws, but almost all of them are architecture-specific (dense or sparse). In this work we revisit existing scaling laws and propose a generalized scaling law to provide a unified framework that is applicable to both dense and sparse LLMs. We evaluate and compare our proposed scaling law with existing scaling laws to demonstrate its effectiveness.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.