Papers
Topics
Authors
Recent
2000 character limit reached

Successive Randomized Compression (SRC)

Updated 26 December 2025
  • SRC is a framework that uses randomized techniques to compress and approximate data, improving efficiency in statistical learning, tensor network contraction, and lossy coding.
  • It integrates methods like boosting, sparse regression codes, and Khatri–Rao sketches to optimize trade-offs between accuracy, complexity, and generalization.
  • Empirical results demonstrate that SRC methods offer significant speedups and tighter performance bounds compared to conventional approaches.

Successive Randomized Compression (SRC) is a framework and set of algorithmic primitives that leverage randomization for efficient compression, representation, and approximation in statistical learning, tensor network contraction, and lossy coding. SRC principles are instantiated in boosting theory, sparse regression codes for lossy compression, and randomized tensor contractions for quantum simulations. Each instantiation exploits randomized selection, sketching, or greedy column choice to balance trade-offs between accuracy, complexity, and generalization.

1. Formal Definitions and Variants

Randomized Sample Compression Schemes. Formally, a randomized compression scheme operates on labeled data (X×Y)n(X \times Y)^n and consists of:

  • A distribution DκD_\kappa over deterministic encoding maps kk, mapping any input sample SS of size nn to a subsequence k(S,n)Sk(S, n) \subseteq S of at most sns_n distinct points.
  • A deterministic reconstruction function ρ\rho that, given any compressed sequence UU of size sn\leq s_n, returns a predictor ρ(U):XY\rho(U): X \to Y.
  • Probabilistic consistency: For all SS of size n\leq n, the reconstructed predictor ρ(k(S,n))\rho(k(S, n)) classifies all points in SS correctly with probability at least 1δ1-\delta.
  • Stability: Conditioning on k(S,n)Sk(S', n)\subseteq S ensures the conditional law of k(S,n)k(S', n) matches that of k(S,n)k(S, n), a property necessary for optimal generalization bounds (Cunha et al., 5 Feb 2024).

Sparse Regression Codes (SPARC, also termed SRC by some authors). In lossy compression, an SRC is constructed by partitioning a Gaussian design matrix into sections and selecting columns through a greedy randomized procedure for approximation (Venkataramanan et al., 2012).

Successive Randomized Compression in Tensor Networks. For matrix product operator (MPO) and matrix product state (MPS) contractions, SRC is a randomized, single-pass sweep that uses Khatri–Rao product sketches for efficient compression of the MPO–MPS product (Camaño et al., 8 Apr 2025).

2. Algorithmic Principles Across Domains

In Learning Theory and Boosting

  • SRC for boosting (SRC-Boost) repeatedly subsamples small batches from weighted distributions, trains weak learners on them, and compresses the voting classifier as the concatenation of subsamples S1STS_1 \oplus \cdots \oplus S_T (Cunha et al., 5 Feb 2024).
  • The reconstruction function retrains the weak learners on the selected subsamples and aggregates their votes to form the final classifier.

In Lossy Compression (SPARC/SRC)

  • The codebook is built as a Gaussian design matrix A\mathbf{A} partitioned into LL sections of MM columns each.
  • Greedy encoding: At each stage ii, select column jij_i maximizing the inner product with the residual, and reduce the residual iteratively. The coefficients cic_i for each section are predetermined to minimize MSE (Venkataramanan et al., 2012).
  • The compressed representation consists of the vector of selected indices (j1,,jL)(j_1, \ldots, j_L) transmitted to the decoder.

In Tensor Network Contraction

  • Operating from right to left, SRC applies randomized QB decompositions to local "unfoldings" of the partially contracted network, utilizing Khatri–Rao product sketches at each step (Camaño et al., 8 Apr 2025).
  • Output cores are determined by thin QR factorizations of these sketches, and projection reduces the contraction size as the sweep progresses.

3. Theoretical Guarantees and Performance Bounds

Generalization Bounds in Learning

  • For any stable randomized compression scheme of size sns_n and consistency error δ\delta, the generalization error satisfies RD(h)Csn+ln(1/β)nR_D(h) \leq C \frac{s_n + \ln(1/\beta)}{n} for some universal constant CC, with probability at least 1β1-\beta over samples and random encoding (Cunha et al., 5 Feb 2024).
  • SRC-Boost achieves this bound for voting classifiers, improving over AdaBoost by reducing the dependence on lnn\ln n from two factors to one.
Method Bound on RD(f)R_D(f)
SRC-Boost O((d+ln(1/γ))ln(n/δ)γ4n)O\left(\frac{(d+\ln(1/\gamma))\ln(n/\delta)}{\gamma^4 n}\right)
AdaBoost O(dln(n/d)lnnγ2n)O\left(\frac{d\,\ln(n/d)\ln n}{\gamma^2 n}\right)

Lossy Compression (SPARC/SRC)

  • SPARC achieves the optimal rate–distortion function D(R)=σ2e2RD(R)=\sigma^2 e^{-2R} for i.i.d. Gaussian sources.
  • With complexity per sample O((n/lnn)2)O((n/\ln n)^2), the probability of excess distortion decays exponentially in nn for a fixed gap above D(R)D^*(R) (Venkataramanan et al., 2012).
  • Robustness: For ergodic sources of variance σ2\sigma^2, SPARC achieves the same distortion guarantee as for Gaussian sources.

Tensor Network Contraction

  • SRC with Khatri–Rao sketches recovers the compressed MPS product exactly (with probability one) if the bond is sufficiently large for exact representation.
  • Approximate errors for standard Gaussian sketches satisfy EAQBF2(1+rprα)AArF2\mathbb E\|A-QB\|_F^2 \leq (1+\frac{r}{p-r-\alpha})\|A-A_r\|_F^2.
  • Computational complexity is O(nDχχ2)O(n D \chi \overline{\chi}^2) when Dχ=χD \leq \chi = \overline{\chi}, and SRC is empirically up to 10×10\times faster than density-matrix and randomized contract-then-compress approaches (Camaño et al., 8 Apr 2025).

4. Trade-Offs: Complexity, Accuracy, and Adaptivity

Lossy Compression Trade-offs

  • With M=LbM=L^b and Ln/lnnL \approx n/\ln n, encoding complexity is polynomial; reducing MM for lower complexity increases the gap above the rate–distortion limit by O(lnlnn/(blnn))O(\ln\ln n/(b\ln n)) (Venkataramanan et al., 2012).
  • "Shannon codebook" (L=1L=1) achieves O(1/n)O(1/\sqrt{n}) convergence but at exponential complexity and storage cost.

SRC in Tensor Networks

  • Single sweep compresses the MPO–MPS product; outperforming dense contraction schemes in speed and error when bond dimension χ\overline{\chi} is moderate.
  • Empirically, oversampling and adaptive bond selection strategies yield optimal bond with minimized computation.
Method Typical Computational Cost
Basic contract-then-compress O(nD3χ3)O(n D^3 \chi^3)
SRC (Khatri–Rao sketches) O(nDχχ2)O(n D \chi \overline{\chi}^2)

SRC-Boost Parameter Selection

  • Subsample size m=Θ((d+ln(1/γ))/γ2)m = \Theta((d+\ln(1/\gamma))/\gamma^2); number of rounds T=Θ((1/γ2)ln(n/δ))T = \Theta((1/\gamma^2)\ln(n/\delta)); compression size sn=Tms_n = T m (Cunha et al., 5 Feb 2024).
  • Stable randomization ensures the optimal single-logarithmic dependency on nn in generalization error.

5. Robustness and Extensions

Robustness Across Sources

  • SPARC/SRC maintains optimal distortion for ergodic sources, provided sample variance converges (Venkataramanan et al., 2012).
  • SRC-Boost maintains stability and consistency even under random subsampling and weak learner error, via margin analysis and union-bound reasoning (Cunha et al., 5 Feb 2024).

Extensions in Tensor Networks

  • SRC with randomized sketches can be executed in parallel across sites, is non-iterative, and applicable in time-evolution, boundary contraction in PEPS, and thermal state computation.
  • Limitation: For small bond dimension or loose tolerance, deterministic methods may suffice or outperform randomized SRC; zip-up compression is sometimes faster at high approximation error (Camaño et al., 8 Apr 2025).

6. Empirical Evidence and Practical Guidelines

Lossy Compression Simulations

  • Empirically, SPARC matches D(R)=e2RD^*(R) = e^{-2R} at low and moderate rates (especially for b=3b=3), and finite-M,LM,L gaps align with theoretical predictions.
  • Variants like norm-squared minimization at each step slightly improve performance over pure inner-product maximization (Venkataramanan et al., 2012).

Tensor Network Benchmarks

  • Synthetic problems (n=100n=100, D=χ=50D=\chi=50): SRC yields up to 10×10\times speedup over traditional density-matrix and randomized contract-then-compress, with matching error.
  • Time evolution for quantum spin systems: SRC inserted in Krylov expansion achieves up to 181×181\times speedup over classical procedures.
  • Heuristics: Oversampling (e.g., p=1.5χp=1.5\overline{\chi}) and adaptive tolerance strategies yield optimal bond selection and minimal computation (Camaño et al., 8 Apr 2025).

SRC-Boost Illustration

  • For n=100n=100 data points, m=20m=20, T=10T=10, the compressed classifier—retrained on the union of subsamples—matches the desired generalization bound O((s+ln(1/δ))/n)O((s+\ln(1/\delta))/n) in practice, and theoretical choices of m,Tm,T drive s/ns/n small for optimal generalization (Cunha et al., 5 Feb 2024).

7. Comparison to Existing Methods and Future Directions

Domain Existing Method SRC Advantage
Boosting AdaBoost/margin bounds Reduces generalization bound from double-log to single-log in nn
Lossy Compression Shannon codebook, scalar quantizer Polynomial complexity for near-optimal rate–distortion trade-off
Tensor Networks Density-matrix, contract-then-compress, zip-up Single-pass, fastest at tight tolerances, parallelizable

Immediate open problems include designing polynomial-complexity encoders whose distortion gap decays like the information-theoretic O(n1/2)O(n^{-1/2}), further analysis of randomized sketch structures in tensor networks, and exploring adaptive schemes for broader classes of ergodic and heavy-tailed sources (Venkataramanan et al., 2012, Cunha et al., 5 Feb 2024, Camaño et al., 8 Apr 2025).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Successive Randomized Compression (SRC).