Lossless Compression for Convex ERM

Updated 7 February 2026

The paper demonstrates a lossless compression framework for convex ERM by generalizing color refinement to obtain exact instance compression across models.
It preserves all global optima by ensuring that gradients and aggregated feature sums remain constant across equitable partitions.
Empirical evaluations on diverse datasets show substantial reductions in data dimensions and runtime while maintaining numerical accuracy.

A lossless compression framework for convex empirical risk minimization (ERM) enables computational reductions in optimization tasks without loss of solution accuracy. The framework introduced by Zhu & Chen is built on color refinement—a combinatorial technique from graph theory—generalized to operate on data or kernel matrices associated with convex, differentiable ERMs. It yields exact instance compression for problems including linear and polynomial regression, (multi)class logistic regression, elastic-net regularization, and kernelized methods, while preserving all global optima and producing measurable practical speedups (Zhu et al., 31 Jan 2026).

1. Convex ERM: Formulation and Lossless Compression

Given feature vectors $x_i \in \mathbb{R}^D$ , labels $y_i$ , and positive weights $v_i$ , convex ERM in standard (primal) form seeks to minimize:

$F(w, b) = \sum_{i=1}^n v_i\,f(x_i^T w + b;\;y_i) + R(w),$

where $f$ is convex and differentiable in its first argument, and $R$ is a convex regularizer (e.g., ridge or elastic-net). The kernelized ERM form utilizes:

Kernel matrix $K_{ij}=k(x_i,x_j)\succeq0$
Dual variables $\alpha \in \mathbb{R}^n$ , bias $b \in \mathbb{R}$

Objective:

$F(\alpha, b) = \sum_{i=1}^n v_i\,f((K\alpha + b 1_n)_i ; y_i) + \frac{\lambda}{2} \alpha^T K \alpha$

A compression mapping $\varphi:(X, y) \mapsto (\tilde X, \tilde y)$ is lossless if the compressed ERM has exactly the same set of solutions as the original problem.

Color refinement operates by representing the data or kernel matrix $A \in \mathbb{R}^{m \times n}$ as a weighted bipartite graph, where rows ( $m$ ) represent samples and columns ( $n$ ) features/kernels. Color refinement seeks a pair of partitions $(\mathcal{P}, \mathcal{Q})$ (rows, columns) that are equitable:

For each sample-block $S \in \mathcal{P}$ $S \in P$ and feature-block $T \in \mathcal{Q}$ $T \in Q$ :
- $\sum_{j \in T} A_{ij}$ is constant over $i \in S$
- $\sum_{i \in S} A_{ij}$ is constant over $j \in T$

The algorithm iteratively refines blocks, initialized by grouping by label, feature statistics, or other coarse signatures, and splits along dimensional aggregates until the finest equitable partition is obtained. The process has time complexity $O(\mathrm{nnz}(A)(\log m+\log n))$ for sparse $A$ or $O(mn(\log m+\log n))$ when $A$ is dense.

3. Compression Mapping and Theoretical Properties

Partition matrices $\Pi_\mathcal{Q} \in \{0,1\}^{n \times q}$ encode feature blocks in $\mathcal{Q}$ , with their row-scaled transposes $\Pi_\mathcal{Q}^{\mathrm{scaled}}$ averaging variables within each block. The compression mapping for the primal ERM is:

$\tilde X = \Pi_\mathcal{P}^{\mathrm{scaled}} X \Pi_\mathcal{Q}$
$\tilde y = \Pi_\mathcal{P}^{\mathrm{scaled}} y$
$\tilde v = \Pi_\mathcal{P}^T v$
Regularizer $R(\Pi_\mathcal{Q}w')$ in terms of compressed $w' \in \mathbb{R}^{|\mathcal{Q}|}$

Main Losslessness Theorem: If $(\mathcal{P}, \mathcal{Q})$ is a reduction coloring (i.e., the gradients of $F$ and constraints are constant on color-blocks), then:

Any optimum $x$ in the original problem compresses to $x' = \Pi_\mathcal{Q}^{\mathrm{scaled}}x$ in the reduced problem.
Any $x'$ optimal for the reduced problem lifts to $x = \Pi_\mathcal{Q} x'$ optimal for the original.

This construction strictly refines any partition from automorphism symmetry ("lifted inference"), guaranteeing at least as much (or more) compression. Proofs leverage convexity (averaging in blocks lowers $F$ ) and preservation of feasibility via partition structure.

4. Specialization to Models and Algorithmic Implementation

The framework admits efficient algorithms for several widely used convex models:

Model	Equitability Conditions	Additional Conditions
Linear regression	$(\mathcal{P}, \mathcal{Q})$ on $X$	$(X^T y)_j$ is constant on each $T \in \mathcal{Q}$
Binary logistic regression	$(\mathcal{P}, \mathcal{Q})$ on $X$	$\sum_i v_i X_{ij} y_i$ const on $T$ ; $v_i$ const on $S$
Multiclass logistic	$(\mathcal{P}, \mathcal{Q})$ on $X$	Replace $y_i$ term with $\mathbf{1}\{y_i=c\}$ counts
Elastic-net regression	Same as linear	Convexity ( $\ell_1$ term) via Jensen argument
Kernel methods	$(\mathcal{Q}, \mathcal{Q})$ on $K$	$\sum_i v_i K_{ij} y_i$ const on $T$ ; $v_i$ const in block

Algorithmic steps consist of a color-refinement loop over $(\mathcal{P}, \mathcal{Q})$ , with model specialization via suitable signature choices to initiate partitions. The closed-form regression solution $w = (X^T X)^{-1} X^T y$ is preserved upon reduction, and similar properties hold for kernel and logistic models.

5. Empirical Evaluation and Compression Performance

Empirical evaluation on five binary classification datasets (Titanic, skin_nonskin, phishing, a7a, breast-cancer) demonstrates:

Substantial reduction in sample and/or feature count without loss of accuracy:

Dataset	Samples: Pre→Post	Features: Pre→Post
Titanic	1309 → 1147	378 → 367
skin_nonskin	245057 → 51444	unchanged
phishing	11055 → 5849	unchanged
a7a	16100 → 13900	122 → 120
breast-cancer	683 → 675	unchanged

Cumulative runtime (reduction plus training) as a fraction of baseline training:

Dataset	Runtime (% of baseline)
Titanic	16
skin_nonskin	28
phishing	21
a7a	91
breast-cancer	90

Prediction and objective values of compressed solutions are within numerical solver tolerance of the full problem.

6. Key Conditions and Complexity Analysis

The framework's theoretical underpinnings are anchored by precise criteria:

Gradient equivalence: $\nabla_{j_1} F(\hat x) = \nabla_{j_2} F(\hat x)$ for all $j_1, j_2$ in the same block
Equitable sums: $\sum_{j\in T} X_{ij}$ constant over $i \in S$ ; $\sum_{i\in S} X_{ij}$ constant over $j \in T$
Compression runtime: sparse $A$ yields $O(\mathrm{nnz}(A)(\log m+\log n))$
Compressed problem dimension: $|\mathcal{P}|$ samples by $|\mathcal{Q}|$ features (or $|\mathcal{Q}|$ dual variables in kernel methods)

By computing the coarsest equitable partition via color refinement, lossless instance compression is achieved for a general class of differentiable convex ERM problems. The approach is strictly at least as strong as group-theoretic symmetry identification and yields measurable end-to-end computational benefits while maintaining exact optimality (Zhu et al., 31 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Exact Instance Compression for Convex Empirical Risk Minimization via Color Refinement (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Lossless Compression Framework for Convex ERM.

Lossless Compression for Convex ERM

1. Convex ERM: Formulation and Lossless Compression

2. Color Refinement and Equitable Partitions

3. Compression Mapping and Theoretical Properties

4. Specialization to Models and Algorithmic Implementation

5. Empirical Evaluation and Compression Performance

6. Key Conditions and Complexity Analysis

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Lossless Compression for Convex ERM

1. Convex ERM: Formulation and Lossless Compression

2. Color Refinement and Equitable Partitions

3. Compression Mapping and Theoretical Properties

4. Specialization to Models and Algorithmic Implementation

5. Empirical Evaluation and Compression Performance

6. Key Conditions and Complexity Analysis

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics