Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 60 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 120 tok/s Pro
Kimi K2 211 tok/s Pro
GPT OSS 120B 416 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Compression Scaling Laws:Unifying Sparsity and Quantization (2502.16440v1)

Published 23 Feb 2025 in cs.LG and cs.CL

Abstract: We investigate how different compression techniques -- such as weight and activation quantization, and weight sparsity -- affect the scaling behavior of LLMs during pretraining. Building on previous work showing that weight sparsity acts as a constant multiplier on model size in scaling laws, we demonstrate that this "effective parameter" scaling pattern extends to quantization as well. Specifically, we establish that weight-only quantization achieves strong parameter efficiency multipliers, while full quantization of both weights and activations shows diminishing returns at lower bitwidths. Our results suggest that different compression techniques can be unified under a common scaling law framework, enabling principled comparison and combination of these methods.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.