Randomized Greedy Compression Methods

Updated 14 December 2025

Randomized greedy compression algorithms are techniques that combine local optimal choices with random sampling to approximate large-scale data efficiently.
They leverage methods like pseudo-random dictionary matching, clustering coresets, and tensor-network sketching to balance computational speed with accuracy.
These methods offer robust theoretical guarantees and practical scalability, finding applications in signal processing, sensor selection, and distributed computing.

A randomized greedy compression algorithm refers to any compression or summarization technique that marries greedy selection—incrementally building an approximation by locally optimal choices—with randomized procedures, such as sampling or randomized projections. These algorithms advance beyond classic deterministic greedy heuristics by targeting scalability, memory efficiency, and theoretical guarantees in large-scale or streaming contexts. Recent work spans sparse signal coding, sensor selection, tensor network compression, clustering coresets, and experiment design, employing randomization at either the dictionary, candidate pool, or sketching matrix levels.

1. Pseudo-Random Dictionary-Based Greedy Compression

Sparse random approximation for vector signals, as formulated by Andrecut ["Sparse Random Approximation and Lossy Compression" (Andrecut, 2011)], is a canonical instance. A pseudo-random dictionary is synthesized by sampling a vector $f_j$ i.i.d. Bernoulli $\pm1$ , and forming overlapping window atoms $\phi^{(m)} \in \mathbb{R}^N$ . Greedy matching pursuit (MP) approximates $x \in \mathbb{R}^N$ by iteratively subtracting from the residual its best-correlating atom, selected via $|c_m| = |\langle r_k, \phi^{(m)} \rangle|$ . After $K$ steps, a $K$ -term approximation $y = \sum_{k=0}^{K-1} \alpha_k \phi^{(i_k)}$ is obtained.

The compression innovation (“CMP”) packs both atom index and its coefficient into a single word $h_k$ , leveraging $x$ 's normalization and the reproducibility of the dictionary from the same seed. The decoding process reconstructs the approximation from packed $(i_k, \alpha_k)$ pairs and the known seed, enabling asymptotically optimal lossy compression and bit-efficient encoding. Empirically, CMP matches MP accuracy at moderate compression ratios and degrades gracefully beyond $K/N \sim 0.5$ . The same seed acts as an encryption key, rendering the approach secure against adversaries lacking the seed.

2. Randomized Greedy Sampling in Clustering and Coreset Construction

A structurally different mode appears in the construction of clustering coresets for $k$ -center with outliers ["Randomized Greedy Algorithms and Composable Coreset for k-Center Clustering with Outliers" (Ding et al., 2023)]. Here, randomized greedy data selection proceeds by repeatedly sampling representatives from the farthest tail of points (those maximally distant from the current set $E$ ), with sample sizes calibrated by error tolerance $\epsilon$ and failure probability $\eta$ . Iterative sampling ensures that all $k$ optimal clusters are probabilistically “hit” and that outliers are disregarded.

A $(1\pm\mu)$ coreset $S \subset X$ preserves clustering cost for all $k$ -center choices, making subsequent clustering on $S$ yield near-optimal results for the full dataset $X$ . In doubling metrics, coresets constructed by randomized greedy selection have size $2z + O((2/\mu)^\rho k \log k)$ , far smaller than prior art, and admit distributed construction with low communication. The technique achieves $(2,O(1/\epsilon))$ -approximations for $k$ -center with $(1+\epsilon)z$ outliers in $O(nk/\epsilon)$ time.

3. Randomized Greedy Sensor Selection and Weak Submodularity

Randomized greedy approaches play a pivotal role in sensor selection for Kalman filtering and MMSE estimation ["Randomized Greedy Sensor Selection: Leveraging Weak Submodularity" (Hashemi et al., 2018)]. The set function $f(S)$ (e.g., reduction in Kalman filter error covariance) is weakly submodular: it is monotone but exhibits bounded curvature $\kappa_{\max}$ . The classical greedy method maximizes $f(S)$ under a cardinality constraint, but randomized sampling of candidate sensors at each iteration leads to considerable computational savings.

Each iteration samples $s \approx (n/K)\log(1/\epsilon)$ candidate sensors, evaluating the marginal gain and selecting the best. Performance guarantees match or approach the greedy factor $1-e^{-1/c}$ for $c=\max\{\kappa_{\max},1\}$ . Empirical results show the randomized greedy variant to be up to $28\times$ faster than classic greedy with negligible degradation in MMSE, particularly pronounced in large-scale contexts ( $n\sim 10^6$ ).

4. Group-Greedy Compression with Randomized Sketching

The randomized group-greedy method ["Randomized Group-Greedy Method for Large-Scale Sensor Selection Problems" (Nagata et al., 2022)] tackles intractable sensor selection problems ( $n \sim 10^5$ – $10^6$ ). Rather than evaluating all candidates, the algorithm compressed the pool to $n_s \ll n$ by uniform random sampling or ‘sketching’. Group-greedy maintains $L$ partial solutions per step, but on the compressed candidate set, reducing cost to $O(Lpn_sr^2)$ . Elite-and-randomized group-greedy (ERGG) further augments the random pool with a small elite set, compensating for quality loss due to randomization—especially valuable when the objective is non-submodular (E-optimality). Empirically, ERGG restores group-greedy performance at $10\times$ speed-up.

5. Tensor-Network Compression via Successive Randomized Greedy Sketching

In high-dimensional tensor networks, single-pass randomized greedy compression emerges as successive randomized compression (SRC) for matrix-product operators and states ["Successive randomized compression: A randomized algorithm for the compressed MPO-MPS product" (Camaño et al., 8 Apr 2025)]. SRC proceeds from right to left over sites, at each step using a Khatri–Rao product of random Gaussian matrices to project the intermediate tensor network and apply a QR factorization, greedily truncating to target bond dimension.

Theoretical results guarantee exact recovery w.p.~1 when the result is representable within the chosen bond dimension and almost-optimal Frobenius error in general. SRC matches or exceeds SVD-based and density-matrix approaches in accuracy, while being $5$– $50\times$ faster in large-scale settings, outperforming other greedy or randomized approaches, especially as bond/product dimensions grow.

6. Theoretical Properties and Performance Guarantees

Randomized greedy compression methods universally exploit probabilistic selection to ensure coverage, approximation guarantees, or bounded error. In sparse signal coding (Andrecut, 2011), the restricted isometry property (RIP) holds for the random dictionary, guaranteeing almost orthogonal atom selection. In clustering (Ding et al., 2023), hitting-set arguments and Azuma’s inequality provide guarantees on cluster and outlier coverage. Sensor selection (Hashemi et al., 2018, Nagata et al., 2022) exploits bounded curvature and submodularity, with theoretical expectation and tail bounds for MMSE.

In tensor compression (Camaño et al., 8 Apr 2025), randomized QB factorization ensures that low-rank projections are unbiased and near-optimal, with exact reconstruction possible when rank constraints are met. Empirical studies demonstrate that randomized greedy methods track deterministic greedy counterparts closely up to high compression ratios, with substantial reductions in runtime or memory.

7. Applications, Security, and Practical Considerations

Randomized greedy compression schemes have broad applications: sparse coding and lossy compression in digital signal processing (Andrecut, 2011), distributed and robust clustering (Ding et al., 2023), efficient large-scale sensor placement (Hashemi et al., 2018, Nagata et al., 2022), and scalable tensor-network contractions for quantum physics and machine learning (Camaño et al., 8 Apr 2025). In CMP, the PRNG seed acts as both dictionary generator and encryption key, yielding computational security due to the infeasibility of dictionary recovery without the seed. Combinatorial or sketch-based compression techniques reduce bandwidth and memory requirements in both centralized and distributed regimes, underpinning advances in scientific computing, robotics, and high-dimensional statistics.

Empirical results underscore the versatility and efficiency of these methods under diverse objectives and signal models. Randomized greedy compression adapts to the scale and structure of modern data problems, balancing optimality and feasibility by harnessing both greedy heuristics and the power of randomization in algorithm design.