Successive Randomized Compression (SRC)

Updated 26 December 2025

SRC is a framework that uses randomized techniques to compress and approximate data, improving efficiency in statistical learning, tensor network contraction, and lossy coding.
It integrates methods like boosting, sparse regression codes, and Khatri–Rao sketches to optimize trade-offs between accuracy, complexity, and generalization.
Empirical results demonstrate that SRC methods offer significant speedups and tighter performance bounds compared to conventional approaches.

Successive Randomized Compression (SRC) is a framework and set of algorithmic primitives that leverage randomization for efficient compression, representation, and approximation in statistical learning, tensor network contraction, and lossy coding. SRC principles are instantiated in boosting theory, sparse regression codes for lossy compression, and randomized tensor contractions for quantum simulations. Each instantiation exploits randomized selection, sketching, or greedy column choice to balance trade-offs between accuracy, complexity, and generalization.

1. Formal Definitions and Variants

Randomized Sample Compression Schemes. Formally, a randomized compression scheme operates on labeled data $(X \times Y)^n$ and consists of:

A distribution $D_\kappa$ over deterministic encoding maps $k$ , mapping any input sample $S$ of size $n$ to a subsequence $k(S, n) \subseteq S$ of at most $s_n$ distinct points.
A deterministic reconstruction function $\rho$ that, given any compressed sequence $U$ of size $\leq s_n$ , returns a predictor $\rho(U): X \to Y$ .
Probabilistic consistency: For all $S$ of size $\leq n$ , the reconstructed predictor $\rho(k(S, n))$ classifies all points in $S$ correctly with probability at least $1-\delta$ .
Stability: Conditioning on $k(S', n)\subseteq S$ ensures the conditional law of $k(S', n)$ matches that of $k(S, n)$ , a property necessary for optimal generalization bounds (Cunha et al., 2024).

Sparse Regression Codes (SPARC, also termed SRC by some authors). In lossy compression, an SRC is constructed by partitioning a Gaussian design matrix into sections and selecting columns through a greedy randomized procedure for approximation (Venkataramanan et al., 2012).

Successive Randomized Compression in Tensor Networks. For matrix product operator (MPO) and matrix product state (MPS) contractions, SRC is a randomized, single-pass sweep that uses Khatri–Rao product sketches for efficient compression of the MPO–MPS product (Camaño et al., 8 Apr 2025).

2. Algorithmic Principles Across Domains

In Learning Theory and Boosting

SRC for boosting (SRC-Boost) repeatedly subsamples small batches from weighted distributions, trains weak learners on them, and compresses the voting classifier as the concatenation of subsamples $S_1 \oplus \cdots \oplus S_T$ (Cunha et al., 2024).
The reconstruction function retrains the weak learners on the selected subsamples and aggregates their votes to form the final classifier.

In Lossy Compression (SPARC/SRC)

The codebook is built as a Gaussian design matrix $\mathbf{A}$ partitioned into $L$ sections of $M$ columns each.
Greedy encoding: At each stage $i$ , select column $j_i$ maximizing the inner product with the residual, and reduce the residual iteratively. The coefficients $c_i$ for each section are predetermined to minimize MSE (Venkataramanan et al., 2012).
The compressed representation consists of the vector of selected indices $(j_1, \ldots, j_L)$ transmitted to the decoder.

In Tensor Network Contraction

Operating from right to left, SRC applies randomized QB decompositions to local "unfoldings" of the partially contracted network, utilizing Khatri–Rao product sketches at each step (Camaño et al., 8 Apr 2025).
Output cores are determined by thin QR factorizations of these sketches, and projection reduces the contraction size as the sweep progresses.

3. Theoretical Guarantees and Performance Bounds

Generalization Bounds in Learning

For any stable randomized compression scheme of size $s_n$ and consistency error $\delta$ , the generalization error satisfies $R_D(h) \leq C \frac{s_n + \ln(1/\beta)}{n}$ for some universal constant $C$ , with probability at least $1-\beta$ over samples and random encoding (Cunha et al., 2024).
SRC-Boost achieves this bound for voting classifiers, improving over AdaBoost by reducing the dependence on $\ln n$ from two factors to one.

Method	Bound on $R_D(f)$
SRC-Boost	$O\left(\frac{(d+\ln(1/\gamma))\ln(n/\delta)}{\gamma^4 n}\right)$
AdaBoost	$O\left(\frac{d\,\ln(n/d)\ln n}{\gamma^2 n}\right)$

Lossy Compression (SPARC/SRC)

SPARC achieves the optimal rate–distortion function $D(R)=\sigma^2 e^{-2R}$ for i.i.d. Gaussian sources.
With complexity per sample $O((n/\ln n)^2)$ , the probability of excess distortion decays exponentially in $n$ for a fixed gap above $D^*(R)$ (Venkataramanan et al., 2012).
Robustness: For ergodic sources of variance $\sigma^2$ , SPARC achieves the same distortion guarantee as for Gaussian sources.

Tensor Network Contraction

SRC with Khatri–Rao sketches recovers the compressed MPS product exactly (with probability one) if the bond is sufficiently large for exact representation.
Approximate errors for standard Gaussian sketches satisfy $\mathbb E\|A-QB\|_F^2 \leq (1+\frac{r}{p-r-\alpha})\|A-A_r\|_F^2$ .
Computational complexity is $O(n D \chi \overline{\chi}^2)$ when $D \leq \chi = \overline{\chi}$ , and SRC is empirically up to $10\times$ faster than density-matrix and randomized contract-then-compress approaches (Camaño et al., 8 Apr 2025).

4. Trade-Offs: Complexity, Accuracy, and Adaptivity

Lossy Compression Trade-offs

With $M=L^b$ and $L \approx n/\ln n$ , encoding complexity is polynomial; reducing $M$ for lower complexity increases the gap above the rate–distortion limit by $O(\ln\ln n/(b\ln n))$ (Venkataramanan et al., 2012).
"Shannon codebook" ( $L=1$ ) achieves $O(1/\sqrt{n})$ convergence but at exponential complexity and storage cost.

SRC in Tensor Networks

Single sweep compresses the MPO–MPS product; outperforming dense contraction schemes in speed and error when bond dimension $\overline{\chi}$ is moderate.
Empirically, oversampling and adaptive bond selection strategies yield optimal bond with minimized computation.

Method	Typical Computational Cost
Basic contract-then-compress	$O(n D^3 \chi^3)$
SRC (Khatri–Rao sketches)	$O(n D \chi \overline{\chi}^2)$

SRC-Boost Parameter Selection

Subsample size $m = \Theta((d+\ln(1/\gamma))/\gamma^2)$ ; number of rounds $T = \Theta((1/\gamma^2)\ln(n/\delta))$ ; compression size $s_n = T m$ (Cunha et al., 2024).
Stable randomization ensures the optimal single-logarithmic dependency on $n$ in generalization error.

5. Robustness and Extensions

Robustness Across Sources

SPARC/SRC maintains optimal distortion for ergodic sources, provided sample variance converges (Venkataramanan et al., 2012).
SRC-Boost maintains stability and consistency even under random subsampling and weak learner error, via margin analysis and union-bound reasoning (Cunha et al., 2024).

Extensions in Tensor Networks

SRC with randomized sketches can be executed in parallel across sites, is non-iterative, and applicable in time-evolution, boundary contraction in PEPS, and thermal state computation.
Limitation: For small bond dimension or loose tolerance, deterministic methods may suffice or outperform randomized SRC; zip-up compression is sometimes faster at high approximation error (Camaño et al., 8 Apr 2025).

6. Empirical Evidence and Practical Guidelines

Lossy Compression Simulations

Empirically, SPARC matches $D^*(R) = e^{-2R}$ at low and moderate rates (especially for $b=3$ ), and finite- $M,L$ gaps align with theoretical predictions.
Variants like norm-squared minimization at each step slightly improve performance over pure inner-product maximization (Venkataramanan et al., 2012).

Tensor Network Benchmarks

Synthetic problems ( $n=100$ , $D=\chi=50$ ): SRC yields up to $10\times$ speedup over traditional density-matrix and randomized contract-then-compress, with matching error.
Time evolution for quantum spin systems: SRC inserted in Krylov expansion achieves up to $181\times$ speedup over classical procedures.
Heuristics: Oversampling (e.g., $p=1.5\overline{\chi}$ ) and adaptive tolerance strategies yield optimal bond selection and minimal computation (Camaño et al., 8 Apr 2025).

SRC-Boost Illustration

For $n=100$ data points, $m=20$ , $T=10$ , the compressed classifier—retrained on the union of subsamples—matches the desired generalization bound $O((s+\ln(1/\delta))/n)$ in practice, and theoretical choices of $m,T$ drive $s/n$ small for optimal generalization (Cunha et al., 2024).

7. Comparison to Existing Methods and Future Directions

Domain	Existing Method	SRC Advantage
Boosting	AdaBoost/margin bounds	Reduces generalization bound from double-log to single-log in $n$
Lossy Compression	Shannon codebook, scalar quantizer	Polynomial complexity for near-optimal rate–distortion trade-off
Tensor Networks	Density-matrix, contract-then-compress, zip-up	Single-pass, fastest at tight tolerances, parallelizable

Immediate open problems include designing polynomial-complexity encoders whose distortion gap decays like the information-theoretic $O(n^{-1/2})$ , further analysis of randomized sketch structures in tensor networks, and exploring adaptive schemes for broader classes of ergodic and heavy-tailed sources (Venkataramanan et al., 2012, Cunha et al., 2024, Camaño et al., 8 Apr 2025).

Markdown Upgrade to Chat

References (3)

Boosting, Voting Classifiers and Randomized Sample Compression Schemes (2024)

Lossy Compression via Sparse Linear Regression: Computationally Efficient Encoding and Decoding (2012)

Successive randomized compression: A randomized algorithm for the compressed MPO-MPS product (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Successive Randomized Compression (SRC).

Successive Randomized Compression (SRC)

1. Formal Definitions and Variants

2. Algorithmic Principles Across Domains

3. Theoretical Guarantees and Performance Bounds

4. Trade-Offs: Complexity, Accuracy, and Adaptivity

5. Robustness and Extensions

6. Empirical Evidence and Practical Guidelines

7. Comparison to Existing Methods and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Successive Randomized Compression (SRC)

1. Formal Definitions and Variants

2. Algorithmic Principles Across Domains

3. Theoretical Guarantees and Performance Bounds

4. Trade-Offs: Complexity, Accuracy, and Adaptivity

5. Robustness and Extensions

6. Empirical Evidence and Practical Guidelines

7. Comparison to Existing Methods and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research