Papers
Topics
Authors
Recent
2000 character limit reached

Buffering for Spatial Sparsity (BSS)

Updated 9 December 2025
  • Buffering for Spatial Sparsity (BSS) is a pruning technique that reweights token similarity using normalized spatial distances to balance redundancy reduction with spatial coverage.
  • It employs a centrifugal, parallel greedy selection algorithm with channel screening and selective feature fusion, dynamically adjusting thresholds for orderly token selection.
  • Empirical evaluations demonstrate that BSS retains over 95% accuracy at extreme sparsity while achieving significant inference speedups across various vision-language models.

Buffering for Spatial Sparsity (BSS) is a criterion introduced to address the challenge of efficiently pruning visual tokens in vision-LLMs (VLMs) while maintaining both redundancy reduction and adequate spatial coverage of semantic content. BSS operates within the VLM-Pruner framework, a training-free, centrifugal (near-to-far) token pruning algorithm that employs a parallel greedy strategy and selective feature fusion to achieve high inference speed with minimal performance degradation at extreme sparsity (Wu et al., 2 Dec 2025).

1. Mathematical Formulation of Buffering for Spatial Sparsity

Suppose an image yields a feature map of size H×WH \times W, resulting in N=HWN = H \cdot W visual tokens indexed by i{0,,N1}i \in \{0,\dots,N-1\}. Each token ii contains a dkd_k-dimensional key vector KiRdkK_i \in \mathbb{R}^{d_k} and a dd-dimensional hidden state HiRdH_i \in \mathbb{R}^d. The algorithm first projects HH onto its qq highest-variance channels (channel screening), producing H~RN×q\tilde H \in \mathbb{R}^{N \times q}.

The pairwise cosine similarity is

Mij=H~iH~jH~i2H~j2M_{ij} = \frac{\tilde H_i^\top \tilde H_j}{\|\tilde H_i\|_2 \|\tilde H_j\|_2}

Define each token's 2D grid coordinate pi=(xi,yi)p_i = (x_i, y_i), with xi=imodWx_i = i \bmod W, yi=i/Wy_i = \lfloor i/W\rfloor. The spatial distance is

Dij(sp)=pipj2,Dmax=H2+W2D_{ij}^{(\rm sp)} = \|p_i - p_j\|_2, \qquad D_{\max} = \sqrt{H^2 + W^2}

Let S{0,,N1}S \subseteq \{0,\ldots,N-1\} denote the active retained token set and CC its complement. For iCi \in C, define the normalized nearest-neighbor spatial distance

δi(S)=minjSDij(sp),δˉi(S)=δi(S)Dmax\delta_i(S) = \min_{j \in S} D_{ij}^{(\rm sp)},\qquad \bar\delta_i(S) = \frac{\delta_i(S)}{D_{\max}}

BSS modulates similarity as

M~ij=Mij(1+λδˉi(S))\widetilde M_{ij} = M_{ij} \bigl(1+\lambda\,\bar\delta_i(S)\bigr)

with buffering strength λ0\lambda \geq 0 (authors set λ=0.5\lambda=0.5). This augmentation increases the apparent redundancy of candidates far from SS, deferring their selection. The surrogate selection score is

ri=1maxjSM~ijr_i = 1 - \max_{j\in S} \widetilde M_{ij}

with candidates accepted if maxjSM~ij<τ(t)\max_{j\in S}\widetilde M_{ij} < \tau^{(t)}, with a scheduled threshold described below.

2. Algorithmic Workflow and Parallel Greedy Selection

The pruning process follows a centrifugal paradigm:

Step Description Hyperparameters
1 Channel screening qq
2 Cosine similarity MM
3 Pivot initialization κ\kappa pivots; Eq. 3.6
4 Compute all Dij(sp)D_{ij}^{(\rm sp)}, DmaxD_{\max}
5 Set threshold/epoch τ(0)=0.8\tau^{(0)}=0.8, Δτ=0.1\Delta\tau=0.1, t=0t=0
6 Main pruning loop S<R|S|<R, buffer parameter λ\lambda

Candidates in CC are ranked by rir_i and processed in parallel batches (size BB). For each batch, candidates are analyzed with up-to-date SS; those meeting the threshold condition are added. If no additions occur in a round, the threshold is incremented: τ(t+1)=τ(t)+Δτ\tau^{(t+1)} = \tau^{(t)} + \Delta\tau. The process continues until S=R|S|=R or failsafe (τ>1+λ\tau > 1+\lambda).

Discarded tokens are clustered by nearest pivot (argmaxjSMuj\arg\max_{j\in S} M_{uj}), and final representation aggregation uses similarity-weighted aggregation (SWA) with

αuj=MujuDjMuj+ϵ,Ej=uDjαujHu,HjβHj+(1β)Ej\alpha_{u\to j} = \frac{M_{u j}}{\sum_{u' \in D_j} M_{u' j} + \epsilon}, \quad E_j = \sum_{u\in D_j} \alpha_{u\to j} H_u, \quad H_j \leftarrow \beta H_j + (1-\beta)E_j

with β=0.3\beta=0.3.

3. Spatial Buffering and Redundancy Modulation

The central principle of BSS is the spatial modulation of redundancy. The minimum spatial distance δi(S)\delta_i(S), normalized to [0,1][0,1] by DmaxD_{\max}, directly controls the penalization term 1+λδˉi(S)1+\lambda\bar\delta_i(S). Candidates spatially further from any retained token are more aggressively up-weighted in redundancy:

  • Early, strict τ\tau favors tokens in local neighborhoods of current pivots (low redundancy, dense detail).
  • As τ\tau increases, acceptance of more remote tokens allows progressive spatial coverage.
  • The buffer parameter λ\lambda mediates the speed and strength of outward expansion: higher λ\lambda defers far tokens more strongly.

Geometrically, the newly added tokens are preferentially those that are simultaneously low-redundancy and spatially proximal, balanced by the threshold annealing schedule. The approach enforces a principled “buffer” around SS, yielding orderly near-to-far token selection, as confirmed by specific qualitative examples (e.g., Fig. 3, (Wu et al., 2 Dec 2025)).

4. Trade-Offs: Redundancy Versus Spatial Coverage

Prior methods based purely on importance tend to over-select from small regions, redundantly wasting capacity. Conversely, pure diversity-based selection over-disperses, missing critical local structure.

BSS provides a calibrated compromise:

  • Centrifugal, near-to-far selection enables rapid local detail preservation near pivots.
  • Gradually reduced buffering (via increasing τ(t)\tau^{(t)}) ensures global spatial coverage.
  • The resulting token set covers both fine object details and broader image context.

This design demonstrably mitigates the classic trade-off between redundancy reduction and spatial sparsity: local features are preserved before moving to semantically/visually distinct, distant regions.

5. Empirical Effects and Comparative Evaluation

Empirical evaluation (Wu et al., 2 Dec 2025) shows that BSS-equipped VLM-Pruner achieves high accuracy under extreme sparsity and significant improvements relative to baselines. For instance:

  • On LLaVA-1.5-7B at 88.9% pruning (retain 64/576 tokens), VLM-Pruner with BSS retains 95.61% of model accuracy, outperforming DivPrune (93.68%) and DART (92.71%).
  • On OCRBench, VLM-Pruner yields a 1.19× end-to-end inference speedup (FLOPs reduced to 22.1% of baseline) while maintaining accuracy; comparable DART speedup is 1.22× but with lower accuracy.
  • On Qwen2-VL-7B, equivalent sparsity yields 92.58% retained accuracy and a 1.60× speedup.
  • Across five VLMs and thirteen benchmarks, BSS consistently outperforms both importance-based and diversity-based approaches, with particularly strong gains at high sparsity.

6. Implementation Notes and Hyperparameters

Key hyperparameters for practical implementation include:

  • Buffering strength: λ=0.5\lambda=0.5
  • Pruning threshold schedule: τ(0)=0.8\tau^{(0)}=0.8, increment Δτ=0.1\Delta\tau=0.1
  • Channel screening dimension: qq
  • Initial pivot count: κ\kappa
  • Batch size for parallelism: BB
  • Similarity-weighted aggregation: β=0.3\beta=0.3

All required operations (channel screening, nearest-neighbor search, batching, aggregation) are suitable for efficient parallelization. The process is amenable to integration into existing token-pruning pipelines for vision-LLMs.

7. Summary and Significance

Buffering for Spatial Sparsity acts as a sub-graph-modular re-weighting scheme, modifying pairwise visual token similarity with normalized spatial distance, embedded within an annealed-threshold greedy pruning framework. This approach results in orderly, centrifugal token selection that maintains both semantic and spatial coverage, addressing key shortcomings of previous redundancy- or diversity-guided pruning designs. Comprehensive evaluations affirm its effectiveness at high sparsity in multiple VLM architectures and tasks (Wu et al., 2 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Buffering for Spatial Sparsity (BSS).