Lossless Compression for Convex ERM
- The paper demonstrates a lossless compression framework for convex ERM by generalizing color refinement to obtain exact instance compression across models.
- It preserves all global optima by ensuring that gradients and aggregated feature sums remain constant across equitable partitions.
- Empirical evaluations on diverse datasets show substantial reductions in data dimensions and runtime while maintaining numerical accuracy.
A lossless compression framework for convex empirical risk minimization (ERM) enables computational reductions in optimization tasks without loss of solution accuracy. The framework introduced by Zhu & Chen is built on color refinement—a combinatorial technique from graph theory—generalized to operate on data or kernel matrices associated with convex, differentiable ERMs. It yields exact instance compression for problems including linear and polynomial regression, (multi)class logistic regression, elastic-net regularization, and kernelized methods, while preserving all global optima and producing measurable practical speedups (Zhu et al., 31 Jan 2026).
1. Convex ERM: Formulation and Lossless Compression
Given feature vectors , labels , and positive weights , convex ERM in standard (primal) form seeks to minimize:
where is convex and differentiable in its first argument, and is a convex regularizer (e.g., ridge or elastic-net). The kernelized ERM form utilizes:
- Kernel matrix
- Dual variables , bias
Objective:
A compression mapping is lossless if the compressed ERM has exactly the same set of solutions as the original problem.
2. Color Refinement and Equitable Partitions
Color refinement operates by representing the data or kernel matrix as a weighted bipartite graph, where rows () represent samples and columns () features/kernels. Color refinement seeks a pair of partitions (rows, columns) that are equitable:
- For each sample-block and feature-block :
- is constant over
- is constant over
The algorithm iteratively refines blocks, initialized by grouping by label, feature statistics, or other coarse signatures, and splits along dimensional aggregates until the finest equitable partition is obtained. The process has time complexity for sparse or when is dense.
3. Compression Mapping and Theoretical Properties
Partition matrices encode feature blocks in , with their row-scaled transposes averaging variables within each block. The compression mapping for the primal ERM is:
- Regularizer in terms of compressed
Main Losslessness Theorem: If is a reduction coloring (i.e., the gradients of and constraints are constant on color-blocks), then:
- Any optimum in the original problem compresses to in the reduced problem.
- Any optimal for the reduced problem lifts to optimal for the original.
This construction strictly refines any partition from automorphism symmetry ("lifted inference"), guaranteeing at least as much (or more) compression. Proofs leverage convexity (averaging in blocks lowers ) and preservation of feasibility via partition structure.
4. Specialization to Models and Algorithmic Implementation
The framework admits efficient algorithms for several widely used convex models:
| Model | Equitability Conditions | Additional Conditions |
|---|---|---|
| Linear regression | on | is constant on each |
| Binary logistic regression | on | const on ; const on |
| Multiclass logistic | on | Replace term with counts |
| Elastic-net regression | Same as linear | Convexity ( term) via Jensen argument |
| Kernel methods | on | const on ; const in block |
Algorithmic steps consist of a color-refinement loop over , with model specialization via suitable signature choices to initiate partitions. The closed-form regression solution is preserved upon reduction, and similar properties hold for kernel and logistic models.
5. Empirical Evaluation and Compression Performance
Empirical evaluation on five binary classification datasets (Titanic, skin_nonskin, phishing, a7a, breast-cancer) demonstrates:
- Substantial reduction in sample and/or feature count without loss of accuracy:
| Dataset | Samples: Pre→Post | Features: Pre→Post |
|---|---|---|
| Titanic | 1309 → 1147 | 378 → 367 |
| skin_nonskin | 245057 → 51444 | unchanged |
| phishing | 11055 → 5849 | unchanged |
| a7a | 16100 → 13900 | 122 → 120 |
| breast-cancer | 683 → 675 | unchanged |
- Cumulative runtime (reduction plus training) as a fraction of baseline training:
| Dataset | Runtime (% of baseline) |
|---|---|
| Titanic | 16 |
| skin_nonskin | 28 |
| phishing | 21 |
| a7a | 91 |
| breast-cancer | 90 |
- Prediction and objective values of compressed solutions are within numerical solver tolerance of the full problem.
6. Key Conditions and Complexity Analysis
The framework's theoretical underpinnings are anchored by precise criteria:
- Gradient equivalence: for all in the same block
- Equitable sums: constant over ; constant over
- Compression runtime: sparse yields
- Compressed problem dimension: samples by features (or dual variables in kernel methods)
By computing the coarsest equitable partition via color refinement, lossless instance compression is achieved for a general class of differentiable convex ERM problems. The approach is strictly at least as strong as group-theoretic symmetry identification and yields measurable end-to-end computational benefits while maintaining exact optimality (Zhu et al., 31 Jan 2026).