Don't Run with Scissors: Pruning Breaks VLA Models but They Can Be Recovered (2510.08464v1)

Published 9 Oct 2025 in cs.RO and cs.LG

Abstract: Vision-Language-Action (VLA) models have advanced robotic capabilities but remain challenging to deploy on resource-limited hardware. Pruning has enabled efficient compression of LLMs, yet it is largely understudied in robotics. Surprisingly, we observe that pruning VLA models leads to drastic degradation and increased safety violations. We introduce GLUESTICK, a post-pruning recovery method that restores much of the original model's functionality while retaining sparsity benefits. Our method performs a one-time interpolation between the dense and pruned models in weight-space to compute a corrective term. This correction is used during inference by each pruned layer to recover lost capabilities with minimal overhead. GLUESTICK requires no additional training, is agnostic to the pruning algorithm, and introduces a single hyperparameter that controls the tradeoff between efficiency and accuracy. Across diverse VLA architectures and tasks in manipulation and navigation, GLUESTICK achieves competitive memory efficiency while substantially recovering success rates and reducing safety violations. Additional material can be found at: https://gluestick-vla.github.io/.

Abstract PDF Chat (Pro)

Summary

The paper demonstrates that pruning causes catastrophic collapse in VLA models, drastically reducing task success and increasing unsafe behaviors.
The proposed GLUESTICK method uses singular value stitching to reintroduce lost weight contributions without retraining.
Experimental results reveal substantial recovery in performance and safety across manipulation and navigation tasks while retaining memory efficiency.

Pruning-Induced Collapse and Recovery in Vision-Language-Action Models

Introduction

Vision-Language-Action (VLA) models represent a paradigm shift in robotics, integrating perception, language understanding, and action generation into unified, end-to-end transformer-based policies. These models leverage large-scale internet robotics data and pretrained vision/language backbones to generalize across diverse tasks and environments. However, their substantial parameter count poses significant challenges for deployment on resource-constrained robotic hardware, necessitating model compression techniques such as pruning. While pruning has proven effective for LLMs, this paper provides the first systematic study demonstrating that pruning induces catastrophic degradation in VLA models, both in terms of task success and safety. The authors introduce GLUESTICK, a post-pruning, training-free recovery method that restores much of the lost functionality while retaining the efficiency benefits of structured sparsity.

Pruning in VLA Models: Empirical Collapse

The study reveals that standard pruning algorithms, including Magnitude and Wanda, which are effective for LLMs, result in near-complete collapse of VLA model performance. For instance, pruning OpenVLA and NaVILA with 50% 2:4 structured sparsity reduces manipulation success rates from 85.2% and 43.0% to 0.0%, respectively, and increases unsafe-episode rates substantially. This degradation is not merely a reduction in efficiency but a fundamental loss of embodied control capabilities, with pruned agents failing to complete tasks and exhibiting unsafe behaviors such as collisions and object drops.

Spectral analysis of weight matrices reveals that VLA layers exhibit flatter singular value spectra compared to language-only models. In LLMs, energy is concentrated in a few dominant directions, making pruning less destructive. In contrast, VLA models distribute energy across many directions, so pruning even small-magnitude weights discards critical signal, explaining their heightened fragility.

GLUESTICK: Post-Pruning Recovery via Singular Value Stitching

GLUESTICK is a post-hoc, training-free recovery algorithm that operates in weight space and is agnostic to the pruning method. For each pruned linear layer, GLUESTICK computes the gap matrix $W_{gap} = W_{dense} - W_{pruned}$ and performs a truncated SVD to extract the top- $r$ singular components. These components are folded into compact matrices $A$ and $B$ , and at inference, the pruned layer output is corrected as $h(x) = W_{pruned}x + A(B^Tx)$ . This approach re-injects dominant lost directions with minimal computational and memory overhead, preserving the efficiency of structured sparsity.

The rank $r$ serves as a hyperparameter controlling the trade-off between memory savings and recovery. Empirically, increasing $r$ improves success-rate recovery at the cost of additional memory, but even moderate values (e.g., $r=200$ or $r=500$ ) yield substantial restoration of performance.

Implementation

GLUESTICK can be integrated into existing PyTorch-based VLA models with minimal code changes. The offline stage computes and stores correction terms for each pruned layer, while the online stage wraps pruned layers to apply the correction during inference. The additional parameters and compute scale as $O((d_{in} + d_{out})r)$ per layer, which is negligible compared to the dense case.

def prime_gluestick(W_dense, W_pruned, r):
    W_gap = W_dense - W_pruned
    U, S, Vh = torch.linalg.svd(W_gap)
    U_r = U[:, :r]
    S_r = S[:r]
    V_r = Vh[:r, :].T
    A = U_r * S_r.unsqueeze(0)
    B = V_r
    return {"A": A, "B": B}

class GLUESTICKWrap(nn.Module):
    def __init__(self, pruned_linear_layer, A, B):
        super().__init__()
        self.pruned_linear = pruned_linear_layer
        self.A = A
        self.B = B
    def forward(self, x):
        y = F.linear(x, self.pruned_linear.weight, self.pruned_linear.bias)
        correction = self.A @ (self.B.T @ x)
        return torch.add(y, correction)

Experimental Results

Manipulation Tasks

On the LIBERO benchmark, full sparse pruning yields an average degradation of -72.4% in success rate. GLUESTICK-500 recovers approximately 50% of the lost success, with particularly strong recovery in spatial and goal-oriented tasks (62% and 57%, respectively). Compared to memory-matched baselines, GLUESTICK achieves a 40% improvement in success rate while maintaining similar memory efficiency.

For navigation (VLN-CE-Isaac), pruning collapses NaVILA's success rate from 43.0% to 0%, with pruned agents exhibiting erratic, inefficient trajectories. GLUESTICK-500 fully restores the dense model's performance, matching both success rates and trajectory quality, and maintaining memory savings within 0.38GB of the fully sparse baseline.

Safety

Pruning increases unsafe-episode rates by up to +23.0% in navigation and +13.6% in manipulation. GLUESTICK-500 restores safety profiles to near parity with dense models, with only a minimal +0.4% change across domains. This indicates that the dominant weight-space directions recovered by GLUESTICK are both task-relevant and safety-critical.

Component Sensitivity

Selective pruning experiments show that vision backbones are disproportionately sensitive to pruning, causing outsized harm relative to their parameter count. For OpenVLA, pruning the vision backbone is 4.75x more damaging per million parameters than pruning the language backbone. This suggests that pruning should focus on language components for maximal efficiency with minimal performance loss.

SVD Compression vs. Pruning

Direct low-rank SVD compression of weights, without pruning, fails to preserve VLA functionality (0% success rate for rank-200 SVD). The pruned weight matrix retains valuable structure that cannot be captured by SVD alone. GLUESTICK leverages this by preserving pruned weights and using SVD only to reintroduce lost directions.

Practical and Theoretical Implications

The findings have immediate implications for deploying VLA models on resource-constrained robotic platforms. Pruning, as validated in LLMs, cannot be naively transferred to embodied models without severe loss of functionality and safety. GLUESTICK provides a universal, training-free recovery step that is compatible with any pruning algorithm and exposes a single interpretable hyperparameter for efficiency-accuracy trade-off. This enables practitioners to adapt VLA models to diverse hardware constraints without retraining or sacrificing safety.

Theoretically, the work highlights the importance of weight-space structure in multimodal models and the limitations of magnitude-based pruning heuristics in settings where energy is distributed across many directions. The spectral analysis suggests that future compression techniques should account for the anisotropy of singular value spectra in different model components.

Future Directions

Potential avenues for further research include:

Prioritizing recovery of safety-critical directions in weight space.
Investigating GLUESTICK's impact on inference speed and energy efficiency.
Dynamically selecting rank $r$ per layer to optimize the recovery-memory trade-off.
Extending the approach to other multimodal and embodied AI architectures.

Conclusion

This work demonstrates that pruning induces catastrophic collapse in VLA models, fundamentally impairing both task success and safety. GLUESTICK, a training-free, pruning-agnostic recovery method, restores much of the lost functionality while retaining the efficiency benefits of structured sparsity. The approach is practical, easily integrable, and exposes a tunable trade-off between memory and accuracy, making it well-suited for real-world robotic deployment. The results underscore the need for compression techniques tailored to the unique properties of multimodal, embodied models and lay the groundwork for future research in efficient, safe AI for robotics.

PDF Markdown

Whiteboard

Generate a whiteboard explanation of this paper.

Paper to Video (Beta)

Generate a video overview of this paper.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Glossary

off on

Practical Applications

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

Authors (9)

Collections

Don't Run with Scissors: Pruning Breaks VLA Models but They Can Be Recovered (2510.08464v1)

Summary

Pruning-Induced Collapse and Recovery in Vision-Language-Action Models

Introduction

Pruning in VLA Models: Empirical Collapse

GLUESTICK: Post-Pruning Recovery via Singular Value Stitching

Implementation

Experimental Results

Manipulation Tasks

Navigation Tasks

Safety

Component Sensitivity

SVD Compression vs. Pruning

Practical and Theoretical Implications

Future Directions

Conclusion

Whiteboard

Paper to Video (Beta)

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (9)

Collections

GitHub

YouTube

alphaXiv

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Don't Run with Scissors: Pruning Breaks VLA Models but They Can Be Recovered (2510.08464v1)

Sponsor

Summary

Pruning-Induced Collapse and Recovery in Vision-Language-Action Models

Introduction

Pruning in VLA Models: Empirical Collapse

GLUESTICK: Post-Pruning Recovery via Singular Value Stitching

Implementation

Experimental Results

Manipulation Tasks

Navigation Tasks

Safety

Component Sensitivity

SVD Compression vs. Pruning

Practical and Theoretical Implications

Future Directions

Conclusion

Whiteboard

Paper to Video (Beta)

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (9)

Collections

GitHub

YouTube

alphaXiv

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research