Papers
Topics
Authors
Recent
Search
2000 character limit reached

Lossless Compression for Convex ERM

Updated 7 February 2026
  • The paper demonstrates a lossless compression framework for convex ERM by generalizing color refinement to obtain exact instance compression across models.
  • It preserves all global optima by ensuring that gradients and aggregated feature sums remain constant across equitable partitions.
  • Empirical evaluations on diverse datasets show substantial reductions in data dimensions and runtime while maintaining numerical accuracy.

A lossless compression framework for convex empirical risk minimization (ERM) enables computational reductions in optimization tasks without loss of solution accuracy. The framework introduced by Zhu & Chen is built on color refinement—a combinatorial technique from graph theory—generalized to operate on data or kernel matrices associated with convex, differentiable ERMs. It yields exact instance compression for problems including linear and polynomial regression, (multi)class logistic regression, elastic-net regularization, and kernelized methods, while preserving all global optima and producing measurable practical speedups (Zhu et al., 31 Jan 2026).

1. Convex ERM: Formulation and Lossless Compression

Given feature vectors xiRDx_i \in \mathbb{R}^D, labels yiy_i, and positive weights viv_i, convex ERM in standard (primal) form seeks to minimize:

F(w,b)=i=1nvif(xiTw+b;  yi)+R(w),F(w, b) = \sum_{i=1}^n v_i\,f(x_i^T w + b;\;y_i) + R(w),

where ff is convex and differentiable in its first argument, and RR is a convex regularizer (e.g., ridge or elastic-net). The kernelized ERM form utilizes:

  • Kernel matrix Kij=k(xi,xj)0K_{ij}=k(x_i,x_j)\succeq0
  • Dual variables αRn\alpha \in \mathbb{R}^n, bias bRb \in \mathbb{R}

Objective:

F(α,b)=i=1nvif((Kα+b1n)i;yi)+λ2αTKαF(\alpha, b) = \sum_{i=1}^n v_i\,f((K\alpha + b 1_n)_i ; y_i) + \frac{\lambda}{2} \alpha^T K \alpha

A compression mapping φ:(X,y)(X~,y~)\varphi:(X, y) \mapsto (\tilde X, \tilde y) is lossless if the compressed ERM has exactly the same set of solutions as the original problem.

2. Color Refinement and Equitable Partitions

Color refinement operates by representing the data or kernel matrix ARm×nA \in \mathbb{R}^{m \times n} as a weighted bipartite graph, where rows (mm) represent samples and columns (nn) features/kernels. Color refinement seeks a pair of partitions (P,Q)(\mathcal{P}, \mathcal{Q}) (rows, columns) that are equitable:

  • For each sample-block SPS \in \mathcal{P} and feature-block TQT \in \mathcal{Q}:
    • jTAij\sum_{j \in T} A_{ij} is constant over iSi \in S
    • iSAij\sum_{i \in S} A_{ij} is constant over jTj \in T

The algorithm iteratively refines blocks, initialized by grouping by label, feature statistics, or other coarse signatures, and splits along dimensional aggregates until the finest equitable partition is obtained. The process has time complexity O(nnz(A)(logm+logn))O(\mathrm{nnz}(A)(\log m+\log n)) for sparse AA or O(mn(logm+logn))O(mn(\log m+\log n)) when AA is dense.

3. Compression Mapping and Theoretical Properties

Partition matrices ΠQ{0,1}n×q\Pi_\mathcal{Q} \in \{0,1\}^{n \times q} encode feature blocks in Q\mathcal{Q}, with their row-scaled transposes ΠQscaled\Pi_\mathcal{Q}^{\mathrm{scaled}} averaging variables within each block. The compression mapping for the primal ERM is:

  • X~=ΠPscaledXΠQ\tilde X = \Pi_\mathcal{P}^{\mathrm{scaled}} X \Pi_\mathcal{Q}
  • y~=ΠPscaledy\tilde y = \Pi_\mathcal{P}^{\mathrm{scaled}} y
  • v~=ΠPTv\tilde v = \Pi_\mathcal{P}^T v
  • Regularizer R(ΠQw)R(\Pi_\mathcal{Q}w') in terms of compressed wRQw' \in \mathbb{R}^{|\mathcal{Q}|}

Main Losslessness Theorem: If (P,Q)(\mathcal{P}, \mathcal{Q}) is a reduction coloring (i.e., the gradients of FF and constraints are constant on color-blocks), then:

  • Any optimum xx in the original problem compresses to x=ΠQscaledxx' = \Pi_\mathcal{Q}^{\mathrm{scaled}}x in the reduced problem.
  • Any xx' optimal for the reduced problem lifts to x=ΠQxx = \Pi_\mathcal{Q} x' optimal for the original.

This construction strictly refines any partition from automorphism symmetry ("lifted inference"), guaranteeing at least as much (or more) compression. Proofs leverage convexity (averaging in blocks lowers FF) and preservation of feasibility via partition structure.

4. Specialization to Models and Algorithmic Implementation

The framework admits efficient algorithms for several widely used convex models:

Model Equitability Conditions Additional Conditions
Linear regression (P,Q)(\mathcal{P}, \mathcal{Q}) on XX (XTy)j(X^T y)_j is constant on each TQT \in \mathcal{Q}
Binary logistic regression (P,Q)(\mathcal{P}, \mathcal{Q}) on XX iviXijyi\sum_i v_i X_{ij} y_i const on TT; viv_i const on SS
Multiclass logistic (P,Q)(\mathcal{P}, \mathcal{Q}) on XX Replace yiy_i term with 1{yi=c}\mathbf{1}\{y_i=c\} counts
Elastic-net regression Same as linear Convexity (1\ell_1 term) via Jensen argument
Kernel methods (Q,Q)(\mathcal{Q}, \mathcal{Q}) on KK iviKijyi\sum_i v_i K_{ij} y_i const on TT; viv_i const in block

Algorithmic steps consist of a color-refinement loop over (P,Q)(\mathcal{P}, \mathcal{Q}), with model specialization via suitable signature choices to initiate partitions. The closed-form regression solution w=(XTX)1XTyw = (X^T X)^{-1} X^T y is preserved upon reduction, and similar properties hold for kernel and logistic models.

5. Empirical Evaluation and Compression Performance

Empirical evaluation on five binary classification datasets (Titanic, skin_nonskin, phishing, a7a, breast-cancer) demonstrates:

  • Substantial reduction in sample and/or feature count without loss of accuracy:
Dataset Samples: Pre→Post Features: Pre→Post
Titanic 1309 → 1147 378 → 367
skin_nonskin 245057 → 51444 unchanged
phishing 11055 → 5849 unchanged
a7a 16100 → 13900 122 → 120
breast-cancer 683 → 675 unchanged
  • Cumulative runtime (reduction plus training) as a fraction of baseline training:
Dataset Runtime (% of baseline)
Titanic 16
skin_nonskin 28
phishing 21
a7a 91
breast-cancer 90
  • Prediction and objective values of compressed solutions are within numerical solver tolerance of the full problem.

6. Key Conditions and Complexity Analysis

The framework's theoretical underpinnings are anchored by precise criteria:

  • Gradient equivalence: j1F(x^)=j2F(x^)\nabla_{j_1} F(\hat x) = \nabla_{j_2} F(\hat x) for all j1,j2j_1, j_2 in the same block
  • Equitable sums: jTXij\sum_{j\in T} X_{ij} constant over iSi \in S; iSXij\sum_{i\in S} X_{ij} constant over jTj \in T
  • Compression runtime: sparse AA yields O(nnz(A)(logm+logn))O(\mathrm{nnz}(A)(\log m+\log n))
  • Compressed problem dimension: P|\mathcal{P}| samples by Q|\mathcal{Q}| features (or Q|\mathcal{Q}| dual variables in kernel methods)

By computing the coarsest equitable partition via color refinement, lossless instance compression is achieved for a general class of differentiable convex ERM problems. The approach is strictly at least as strong as group-theoretic symmetry identification and yields measurable end-to-end computational benefits while maintaining exact optimality (Zhu et al., 31 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Lossless Compression Framework for Convex ERM.