Bilateral Distribution Compression

Updated 23 September 2025

Bilateral Distribution Compression is a paradigm that simultaneously compresses the sample and feature axes to maintain distributional fidelity.
It uses a two-stage process: training a distribution-sensitive autoencoder to learn a latent representation and then compressing the latent space via EMMD minimization.
The method guarantees that controlling RMMD and EMMD ensures low decoded distribution error (DMMD), enabling scalable, accurate data coresets.

Bilateral Distribution Compression (BDC) is a data compression paradigm in which both sample size and ambient feature dimensionality are reduced, subject to rigorous distributional fidelity constraints. Unlike classical approaches that act solely on the sample axis (subsampling, coreset selection) or the feature axis (dimensionality reduction), BDC seeks to maintain the original data distribution through a two-stage optimization: first, it learns a low-dimensional latent embedding sensitive to distributional structure; second, it constructs a compressed set in latent space whose decoded images closely approximate the distribution of the original data. This is formalized through the control of Maximum Mean Discrepancy (MMD) distances measured at key points in the compression pipeline.

1. Motivation and Foundations

Traditional distribution compression methods (e.g., kernel herding, greedy MMD minimization, ambient coreset selection) operate within the full ambient space and generally scale poorly when both the number of samples ( $n$ ) and feature dimension ( $d$ ) are large. Methods such as PCA offer dimensionality reduction, but unless combined with targeted sampling, downstream models may lose critical distributional properties. BDC is designed for the modern regime in which both axes are large and potentially redundant, enabling linear time and memory scaling in $n$ and $d$ via latent-space compression.

2. Central Metrics: RMMD, EMMD, DMMD

BDC introduces three kernel-based distributional distances, each serving a distinct function:

Metric	Purpose	Domain
RMMD	Measure embedding fidelity	Ambient/latent
EMMD	Quantify compressed set alignment	Latent
DMMD	Evaluate decoded distribution error	Ambient

Reconstruction MMD (RMMD):

$\mathrm{RMMD}(P_X, P_{\varphi(\psi(X))})$

measures the distance between the original distribution $P_X$ and its reconstruction $\varphi(\psi(X))$ by the encoder $\psi$ and decoder $\varphi$ . In the linear case (projection $V$ subject to $V^\top V=I$ ), minimizing RMMD induces a subspace akin to principal components under quadratic kernels.

Encoded MMD (EMMD):

$\mathrm{EMMD}(P_{\psi(X)}, P_Z)$

quantifies the distributional discrepancy between embedded data and the proposed compressed set $Z$ in latent space.

Decoded MMD (DMMD):

$\mathrm{DMMD}(P_X, P_{\varphi(Z)})$

is the ultimate measure of fidelity between the original and decoded compressed sets, guiding the construction of coresets faithful to the empirical or target distribution.

A key theoretical result is:

$\mathrm{DMMD}(P_X, P_{\varphi(Z)}) \leq \mathrm{RMMD}(P_X, P_{\varphi(\psi(X))}) + \mathrm{EMMD}(P_{\psi(X)}, P_Z)$

which ensures that controlling RMMD and EMMD suffices for overall fidelity.

3. Two-Stage Compression Procedure

Stage 1: Distributional Autoencoder Training

An encoder $\psi: \mathbb{R}^d \rightarrow \mathbb{R}^p$ and decoder $\varphi: \mathbb{R}^p \rightarrow \mathbb{R}^d$ are trained jointly on $X$ to minimize RMMD (possibly in convex combination with mean squared reconstruction error, MSRE).
For linear models, optimization is performed on the Stiefel manifold; for nonlinear cases (BDC-NL), standard neural architectures are used under bottleneck constraints.

Stage 2: Latent Set Compression

Encoded representations $Z_{data} = \psi(X)$ are computed.
A compressed set $Z = \{z_1, \ldots, z_m\} \subset \mathbb{R}^p$ ( $m \ll n$ ) is initialized (e.g., by sampling latent codes) and optimized to minimize EMMD w.r.t. $Z_{data}$ ; the latent kernel $h(z,z')$ may be pulled back from the ambient kernel $k(\varphi(z), \varphi(z'))$ .
The decoded set $\varphi(Z)$ constitutes the final compressed dataset.

4. Complexity, Performance, and Practical Advantages

The BDC pipeline offers linear scaling ( $\mathcal{O}(nd)$ ) in both sample size and dimension. Empirical studies across synthetic manifolds, images (MNIST, CT-Slice), and real-world regression/classification settings demonstrate that BDC matches or outperforms ambient-space compression (ADC) and kernel herding, particularly in preserving distributional features (i.e., DMMD scores).

Key empirical findings:

BDC maintains distributional fidelity even with aggressive reduction in both $n$ and $d$ .
For PCA (quadratic kernel), linear autoencoder BDC-L yields principal subspaces; nonlinear autoencoder BDC-NL gives finer manifold alignment.
In supervised extensions (via tensor-product kernels), joint features and responses are compressed (RJMMD, EJMMD, DJMMD), facilitating label-preserving coresets.

5. Theoretical Guarantees

BDC is supported by theoretical certification:

If both RMMD and EMMD are sufficiently small (ideally, tending to zero), DMMD is also small, ensuring the decoded set is distributionally close to the source.
The bound

$\mathrm{DMMD}(P_X, P_{\varphi(Z)}) \leq \mathrm{RMMD}(P_X, P_{\varphi(\psi(X))}) + \mathrm{EMMD}(P_{\psi(X)}, P_Z)$

serves as the main guarantee, motivating the two-stage minimization.

6. Extensions, Limitations, and Application Domains

Supervised, Semi-supervised, or Conditional Compression: By employing tensor-product kernels, BDC can be adapted to preserve joint feature-label distributions.
Manifold assumption: BDC is most effective when data lies approximately on a low-dimensional manifold. When manifold structure is absent, the effectiveness of feature compression may degrade.
Architecture: Linear BDC (BDC-L) suffices and is preferred when manifold structure is simple, while nonlinear BDC (BDC-NL) targets more complex geometries but may introduce training variability.

BDC generalizes and subsumes several paradigms:

Ambient Distribution Compression (ADC): Operates on full-dimensional data; BDC is designed to outperform ADC when manifold structure is present.
Distributional autoencoding: BDC’s RMMD loss aligns with recent trends in distribution-preserving compression and Bayesian data summarization (Harth-Kitzerow et al., 2020).
Distributed Compression and Distributed Detection: In two-node detection systems, BDC-style separation of compression axes aligns with rate-error-distortion tradeoffs (Katz et al., 2016), adaptive entropy bottlenecks (Ulhaq et al., 2024), and bi-directional channel models in federated learning (Egger et al., 31 Jan 2025).

8. Conclusion

Bilateral Distribution Compression operationalizes simultaneous sample and dimensionality reduction subject to rigorous control of decoded distributional error. By minimizing MMD-based discrepancies at embedding, compression, and decoding points, BDC yields coresets that are compact yet highly representative of the original dataset. These properties make BDC suitable for scalable machine learning pipelines, distributional modeling, and robust downstream analysis in both unsupervised and supervised scenarios (Broadbent et al., 22 Sep 2025).

PDF Markdown Chat (Pro)

References (5)

Towards Bayesian Data Compression (2020)

Distributed Binary Detection with Lossy Data Compression (2016)

Learned Compression of Encoding Distributions (2024)

BICompFL: Stochastic Federated Learning with Bi-Directional Compression (2025)

Bilateral Distribution Compression: Reducing Both Data Size and Dimensionality (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Bilateral Distribution Compression (BDC).

Bilateral Distribution Compression

1. Motivation and Foundations

2. Central Metrics: RMMD, EMMD, DMMD

3. Two-Stage Compression Procedure

4. Complexity, Performance, and Practical Advantages

5. Theoretical Guarantees

6. Extensions, Limitations, and Application Domains

8. Conclusion

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Bilateral Distribution Compression

1. Motivation and Foundations

2. Central Metrics: RMMD, EMMD, DMMD

3. Two-Stage Compression Procedure

4. Complexity, Performance, and Practical Advantages

5. Theoretical Guarantees

6. Extensions, Limitations, and Application Domains

7. Related Paradigms and Comparative Context

8. Conclusion

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research