GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression (2501.00339v3)

Published 31 Dec 2024 in cs.CL and cs.LG

Abstract: Recent studies have demonstrated that many layers are functionally redundant in LLMs, enabling model compression by removing these layers to reduce inference cost. While such approaches can improve efficiency, indiscriminate layer pruning often results in significant performance degradation. In this paper, we propose GRASP (Gradient-based Retention of Adaptive Singular Parameters), a novel compression framework that mitigates this issue by preserving sensitivity-aware singular values. Unlike direct layer pruning, GRASP leverages gradient-based attribution on a small calibration dataset to adaptively identify and retain critical singular components. By replacing redundant layers with only a minimal set of parameters, GRASP achieves efficient compression while maintaining strong performance with minimal overhead. Experiments across multiple LLMs show that GRASP consistently outperforms existing compression methods, achieving 90% of the original model's performance under a 20% compression ratio.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression (2501.00339v3)

Collections

Summary

Paper Prompts

Follow-up Questions

Authors (6)

Don't miss out on important new AI/ML research

GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression (2501.00339v3)

Collections

Summary

Paper Prompts

Follow-up Questions

Related Papers

Authors (6)

Don't miss out on important new AI/ML research