BLC: Best Low-Rank Approximation under Clipping

Updated 16 January 2026

The paper introduces BLC as a novel method that integrates low-rank extraction with outlier clipping to minimize quantization error.
It employs an alternating iterative approach that refines rank approximations and clipping thresholds, ensuring rapid convergence and minimal overhead.
Empirical validations show significant improvements in perplexity and quantization fidelity for 2–4 bit post-training quantization of large language models.

Best Low-rank Approximation under Clipping (BLC) is a method central to the FLRQ (Flexible Low-Rank Quantization) framework, designed for efficient and accurate post-training quantization (PTQ) of LLMs. BLC focuses on minimizing quantization error by alternating scalable low-rank extraction with outlier clipping and quantization of the residual, thereby providing robust, near-monotone error decrease and high quantization quality at minimal computational overhead (Gul et al., 9 Jan 2026).

1. Formal Definition and Objective

BLC addresses the quantized low-rank approximation of a neural network weight matrix $W \in \mathbb{R}^{m \times n}$ , targeting efficient storage and inference without costly fine-tuning. Given $W$ , a desired bit-width $d$ for uniform quantization, and a calibration activation matrix $X \in \mathbb{R}^{n \times B}$ (with $B$ activation samples), the goal is to decompose $W$ as:

A rank- $r$ matrix $W_r = UV^T$ with $U \in \mathbb{R}^{m \times r}$ and $V \in \mathbb{R}^{n \times r}$ ,
A clipping threshold $\tau \geq 0$ ,
A quantized residual $W_q = \textrm{Quant}\left( \textrm{Clip}(W - W_r, \tau), d \right)$ ,

such that the end-to-end error on the calibration batch,

$E(U, V, \tau) := \| WX - [W_r + W_q] X \|_2,$

is minimized. In matrix norm notation, the objective is:

$\min_{U, V, \tau \geq 0} \| W - [UV^T + \textrm{Quant}( \textrm{Clip}(W - UV^T, \tau), d ) ] \|_F^2,$

where the clipping operator $\textrm{Clip}(A, \tau)_{ij} = \textrm{sign}(A_{ij}) \cdot \min(|A_{ij}|, \tau)$ , and quantization uses uniform $d$ -bit codebooks (with possible group-wise scaling or zero-point schemes).

2. Optimization Motivation

Low-rank preconditioning with $W_r$ captures the majority of the variance in $W$ , isolating heavy-tail outliers in the residual $R = W - W_r$ . Directly quantizing $R$ can produce excessive rounding errors, especially from outlier values. By clipping $R$ to $\pm \tau$ , the dynamic range for quantization is reduced, thus improving quantization fidelity for the bulk of residual entries at the expense of zeroing a small fraction of large outliers.

A joint, closed-form optimization over all variables $(U, V, \tau)$ is computationally intractable. BLC therefore uses an alternating iterative approach:

Recompute the best rank- $r$ low-rank component of the current residual (via R1-Sketch flexible rank selection).
Search for the optimal clipping threshold $\tau$ that, after quantizing the residual, minimizes calibration error.

The working assumption is that reproducing $WX$ with small $\ell_2$ error on the calibration set is a sufficient proxy for maintaining accuracy post-quantization.

3. Iterative BLC Procedure

The central loop of BLC alternates between low-rank extraction and clipping/quantization threshold search. The following pseudocode outlines the key steps within the context of FLRQ:

W_r = initial_low_rank(W)       # via SVD or R1‐FLR
W_q = Quant(Clip(W - W_r, τ0), d)
bestE = +∞
best_{W_r,W_q} = {W_r, W_q}

for epoch in range(epochs):
    E = || W X - (W_r + W_q) X ||_2
    if E < bestE:
        bestE = E
        best_{W_r, W_q} = {W_r, W_q}
    R = W - W_q
    U, V = R1-FLR(R)             # Flexible-rank low-rank extraction
    W_r = U V^T
    bestτ, bestWq = argmin_{τ ∈ p_clp_search} || W X - (W_r + Quant(Clip(W - W_r, τ), d)) X ||_2
    W_q = bestWq

return best_{W_r, W_q}

R1-FLR leverages the fast Rank-1 Sketch (with Gaussian projection) to extract layer- and data-dependent rank efficiently, allowing for outlier-aware low-rank representations. The clipping threshold search is performed over $10$ logarithmically spaced fractions of the residual's maximum absolute value.

4. Theoretical Properties

The theoretical properties of BLC arise from the error guarantees of R1-Sketch and the monotone error reduction obtained via alternating minimization:

With $r = 1$ power iteration and expectation, the R1-Sketch error bound is

$\mathbb{E} \|A - A_1\| \leq \sigma_2 + [1 + 4\sqrt{2n/(1-1)}]^{1/(it+1)} \sigma_2,$

guaranteeing top-singular-vector approximation within a small constant factor per power iteration (Halko et al., 2011).

Each BLC epoch does not increase the calibration error $E$ ; storing the best solution ensures non-increasing objective. Although the problem is non-convex, convergence within $O(1)$ epochs is typical for 3–4 bit quantization, and $O(10–20)$ for 2 bits.
Per-epoch complexity is dominated by the cost of R1-Sketch (i.e., $O(\text{it} \cdot n^2)$ , implemented as GEMV BLAS-2 operations) and the threshold search (which is a small loop over preselected $\tau$ values). Empirically, using $it = 2$ for R1-Sketch costs only $6$ GEMV operations and is 2–5 $\times$ faster than truncated SVD (Gul et al., 9 Jan 2026).

5. Practical Parameter Choices

Practical deployment of BLC involves several empirically validated settings:

R1-Sketch with $it = 2$ is sufficient to match truncated SVD accuracy at a fraction of the computational cost (see Table 13 in (Gul et al., 9 Jan 2026)).
Quantization is performed using group size $128$, consistent with AWQ default.
Calibration data consists of $128$ sequences of $2048$ tokens from WikiText2.
The initial low-rank component $W_r$ can be derived via a full SVD or a single R1-FLR pass.
Clipping thresholds $\tau$ are searched over $10$ logarithmically spaced fractions of the absolute max of the current residual.
The recommended number of BLC epochs is 1–2 for 4-bit quantization, 2–5 for 3 bits, and 10–30 for 2 bits (see Figure 1 and Table 15 in (Gul et al., 9 Jan 2026)).

6. Empirical Validation and Comparisons

BLC achieves state-of-the-art accuracy and efficiency in quantized LLMs. Key experimental results on benchmark models and tasks include:

On OPT-1.3B with W3A16 quantization, BLC improves perplexity (PPL) from 15.80 (without BLC) to 15.53 (with BLC), a delta of $-0.27$ .
For W2A16 on OPT-1.3B, PPL without BLC increases to above $10^4$ due to overflow; BLC reduces this to 22.99.
On six zero-shot tasks for LLaMA2-7B at 3 bits, FLRQ+BLC improves average accuracy from 53.7% to 54.4%.
Ablation shows that removing BLC degrades 2-bit OPT-1.3B PPL from 22.99 to 29.32.
Throughput and latency overhead of FLRQ+LoRA on W4A16 is only 4–6% relative to the baseline (Figure 2).
Compared with fixed-rank LQER at rank 256, adaptive FLRQ achieves similar or better PPL with average rank ≈ 39 and extra storage ≈ 0.24 bits per parameter (Table 9).

BLC thus provides an alternating low-rank extraction and outlier quantization procedure with robust calibration loss reduction, fast convergence, and minimal overhead, underpinning accurate and efficient 2–4 bit quantization for large-scale models (Gul et al., 9 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

FLRQ: Faster LLM Quantization with Flexible Low-Rank Matrix Sketching (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Best Low-rank Approximation under Clipping (BLC).

BLC: Best Low-Rank Approximation under Clipping

1. Formal Definition and Objective

2. Optimization Motivation

3. Iterative BLC Procedure

4. Theoretical Properties

5. Practical Parameter Choices

6. Empirical Validation and Comparisons

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

BLC: Best Low-Rank Approximation under Clipping

1. Formal Definition and Objective

2. Optimization Motivation

3. Iterative BLC Procedure

4. Theoretical Properties

5. Practical Parameter Choices

6. Empirical Validation and Comparisons

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research