Representation-Based Compression

Updated 26 January 2026

Representation-based compression is a method that encodes data via learned or engineered representations, such as neural network weights, embeddings, or geometric primitives.
It exploits global data structure to reduce redundancy, supporting efficient storage, rapid decoding, and task-specific applications across images, videos, and 3D models.
The approach involves key steps like representation learning, quantization, and entropy modeling to achieve significant bitrate reductions while maintaining high fidelity.

Representation-based compression is an approach to data reduction that replaces explicit, sample-wise encoding with a compact description in terms of a learned or engineered intermediate representation—such as embeddings, neural network weights, geometric primitives, or sparse codes—together with the means to decode the target signal with high fidelity. By modeling data structure, semantics, or dynamics directly in the representation, these methods aim to reduce redundancy more effectively than conventional transform or pixel-domain codecs, while sometimes enabling additional properties such as rapid decoding, flexible query support, or alignment with downstream tasks.

1. Fundamental Principles and Motivation

Representation-based compression (RBC) is defined by its reliance on a compact, data-adaptive representation instead of or in addition to direct signal quantization. Rather than storing or transmitting the original data directly (as raster images, vectors, or videos), RBC encodes a learned mapping, parameter vector, or collection of interpretable primitives. This yields a bitstream or compressed artifact consisting of:

The representation parameters (e.g., neural network weights, geometric atom coefficients, dictionary indices)
Any necessary side information for decoding (e.g., quantization steps, codebooks, or entropy model parameters)
Optionally: explicit reconstruction error or correction codes (for lossy/lossless trade-offs)

The main theoretical advantages of RBC are:

Ability to exploit global structure and redundancies not easily captured by block-based or sample-wise techniques
Natural scaling of storage and decoding complexity via representation complexity, independent of raw input size
Potential for task-oriented or semantic compression, as in analytics pipelines or feature transmission

Applications span images, video, 3D geometry, text embeddings, neural network weights, and specialist domains such as channel state information or scientific data (Kwan et al., 2023, Lee et al., 6 Mar 2025, Hu et al., 2021, Škrlj et al., 2021, Kwan et al., 2024, Li et al., 22 Dec 2025, Chou et al., 2016).

2. Representation Modalities and Mathematical Frameworks

RBC encompasses a range of mathematical constructs and model types, including:

Implicit Neural Representations (INRs)

A target signal (e.g., image, video, light field) is overfit by a neural function $f_\theta$ (e.g., an MLP or hybrid CNN) mapping coordinates to signal values. The compressed form is the quantized weight vector $\theta$ . Reconstruction at the receiver involves evaluating $f_\theta$ to recover any or all data points. This paradigm supports both spatial (images) and spatiotemporal (video/light field) domains and can employ hierarchical, convolutional, or positional-encoded architectures for improved efficiency and rate-distortion performance (Kwan et al., 2023, Kwan et al., 2024, Tang et al., 10 Feb 2025, Zhang et al., 17 Oct 2025, Wang et al., 2024).

Primitive- or Atom-based Representations

Signals are modeled as sums over a small number of learned primitives—such as 2D Gaussians for images or video frames (Zhang et al., 2024, Lee et al., 6 Mar 2025, Li et al., 22 Dec 2025), or polygons/triangles for 3D geometry (Chou et al., 2016). Fitting and compressing the parameters (positions, shapes, colors) can yield highly compact and rapidly decodable representations.

Feature and Embedding Compression

Latent representations (document embeddings, features from deep neural networks) are recursively or directly compressed to lower-dimensional or quantized forms, via SVD, PCA, clustering, or learned transforms. The compressed representation is then usable for downstream analysis with minimal loss, and sometimes even performance gains due to denoising effects (Škrlj et al., 2021, Hu et al., 2021).

Sparse and Dictionary-based Codes

Signals are expressed in terms of a small set of basis atoms, either orthonormal (for 1D signals) or overcomplete dictionaries (for images), leveraging the Minimum Description Length (MDL) principle to optimize sparsity under a lossless or lossy constraint. This yields unique, data-driven compressed representations with practical discriminative power (Sabeti et al., 2021, Guha et al., 2012).

Sequential and Predictive Representations

Data is modeled as a sequence for which each element is predicted from the past, with likelihoods trained to maximize compression. Notably, autoencoder-style neural networks can be trained to maximize exact log-likelihood, supporting direct arithmetic coding (Gregor et al., 2011).

3. Algorithmic Pipelines and Architectures

Representation-based compression typically involves the following algorithmic steps:

Data Fitting / Representation Learning
- Overfit a model (e.g., neural network, dictionary, Gaussian primitives) to reconstruct the signal from compact parameters.
- Often incorporates regularizers (e.g., total variation, sparsity) or structural mechanisms (e.g., hierarchical grids, content-adaptive allocation) (Kwan et al., 2023, Lee et al., 6 Mar 2025, Li et al., 22 Dec 2025).
Quantization and Entropy Modeling
- Apply quantization-aware training, vector/scalar quantization, or scalar encoding to reduce the bit width of representation parameters (Kwan et al., 2024, Li et al., 22 Dec 2025).
- Train or assign entropy models (contextual, Gaussian, Laplacian, codebook-based) to enable efficient arithmetic coding, sometimes with hierarchical or per-parameter adaptivity (Tang et al., 10 Feb 2025, Zhang et al., 17 Oct 2025, Lee et al., 6 Mar 2025).
Rate-Distortion Optimization
- Formulate an explicit rate–distortion loss in the form $L = D + \lambda R$ , where $D$ is data domain distortion (e.g., MSE, MS-SSIM), $R$ is the estimated bit cost of representation, and $\lambda$ controls the trade-off.
- Illustrated in Bayesian INR compression (Guo et al., 2023), context-aware light field compression (Zhang et al., 17 Oct 2025), and end-to-end video coding (Kwan et al., 2024, Kwan et al., 2023).
(Optional) Structural or Task-based Adaptation
- Incorporate content-adaptive architecture search, frame-level adaptation, or scene-aware latent codes to further bias representation complexity towards dynamic and information-rich regions (Tang et al., 10 Feb 2025, Zhang et al., 17 Oct 2025).
Bitstream Emission and Decoding
- Serialize quantized/compressed representation parameters and side-information.
- Decoding may consist of a forward pass through a lightweight model, geometric splatting, or context-driven inverse transforms.

This high-level workflow is instantiated specifically for different modalities, as summarized in the table:

Domain	Representation Type	Compression Method	Notable Reference
Image	INR, Gaussian Splatting	QAT, VQ, bits-back, LSQ, entropy coding	(Zhang et al., 2024, Li et al., 22 Dec 2025)
Video	INR, Gaussian Splatting	QAT, context entropy coding, DSA/DFA	(Kwan et al., 2023, Lee et al., 6 Mar 2025, Tang et al., 10 Feb 2025)
3D Geometry	Polygon clouds (triangles)	Octree, RAHT, differential coding	(Chou et al., 2016)
NLP Embedding	Latent vector, SVD	Recursive SVD, clustering, autoencoder	(Škrlj et al., 2021)
Light Field	INR, Hierarchical Scene Codes	QAT, codebook hyperprior	(Zhang et al., 17 Oct 2025, Wang et al., 2024)
Feature Analytics	Latent, codebook manifold	Codebook-based entropy, multi-task	(Hu et al., 2021)

4. Theoretical and Empirical Performance

Representation-based compression routinely achieves bit-rate reductions of one or more orders of magnitude over naive sample-wise codecs while maintaining comparable (and sometimes superior) distortion metrics. Some key findings include:

Video: HiNeRV achieves 72.3% bitrate saving over HNeRV and 43.4% over DCVC on the UVG dataset (Kwan et al., 2023); NVRC secures 24.3% coding gain over VVC VTM (PSNR, UVG) (Kwan et al., 2024); CANeRV demonstrates negative BDBR vs. H.266/VVC (Tang et al., 10 Feb 2025).
Image: GaussianImage++ exceeds state-of-the-art INRs like COIN by up to 1 dB at low bpp, maintaining >1500 FPS decoding and sub–0.1 MB model size (Li et al., 22 Dec 2025); partial bits-back in GaussianImage achieves further gain (Zhang et al., 2024).
Document representations: Recursive SVD (CoRe) can reduce embeddings by 10–50× while preserving or improving classification F1 scores due to task-relevant denoising (Škrlj et al., 2021).
3D Geometry: Dynamic polygon clouds attain intra-frame geometry rates of 0.07–0.44 bpv, translating to >10× compression over classic octree codecs, with robustness to topology and noise artifacts (Chou et al., 2016).
Analytics representations: Codebook-manifold hyperpriors cut plateau bitrates for joint tasks by 3–6× versus baselines, and generalize to unseen tasks without retraining the compressor (Hu et al., 2021).

The rate–distortion trade-off of representation-based codecs is governed by the expressivity of the representation and the effectiveness of quantization and entropy modeling. Incorporating content or structure awareness routinely leads to significant RD improvements, as in scene-aware latent codes for light fields (Zhang et al., 17 Oct 2025) and distortional densification in GaussianImage++ (Li et al., 22 Dec 2025).

5. Emerging Techniques and Architectural Innovations

Recent research has advanced the state of RBC through the following developments:

Hierarchical Encodings and Multi-scale Grids: HiNeRV and NVRC introduce multi-resolution grid features and hierarchical convolutional/MLP architectures to efficiently encode spatiotemporal information for videos (Kwan et al., 2023, Kwan et al., 2024).
Primitive-based Densification and Filtering: GaussianImage++ uses distortion-driven densification and context-aware filtering to improve primitive allocation and representation capacity, enabling high-fidelity at lower Gaussian counts (Li et al., 22 Dec 2025).
Bayesian and Variational Neural Codecs: COMBINER employs variational Bayesian inference, β-ELBO optimization, and relative-entropy coding (A* sampler) for principled model-compression and direct rate–distortion control (Guo et al., 2023).
End-to-End Quantization-aware Rate-Distortion Optimization: SANR and NVRC implement quantization-aware training with explicit entropy modeling and hierarchical coding, enabling joint optimization of all codec parameters for global rate–distortion efficiency (Zhang et al., 17 Oct 2025, Kwan et al., 2024).
Content-Adaptive Modeling: CANeRV utilizes dynamic sequence-level and frame-level architectural adjustment to optimally allocate model capacity where needed, surpassing even standard codecs in a variety of video contexts (Tang et al., 10 Feb 2025).
Token-based Representation Compression: Representation Shift enables efficient on-the-fly token compression in large transformer models without attention mask construction, supporting up to 5.5× throughput boosts with minimal accuracy loss (Choi et al., 1 Aug 2025).

These innovations are often modular, allowing transfer across domains (e.g., Gaussian splatting for both images and videos; context-modeling in light field and video).

6. Limitations, Challenges, and Future Directions

While RBC achieves compelling results across data regimes, several open challenges remain:

Encoding Complexity: Overfitting INRs or fitting large numbers of geometric primitives for high-resolution signals can require significant compute time; approaches such as meta-learning, fast NAS, or hybrid architectures are active research areas (Li et al., 22 Dec 2025, Tang et al., 10 Feb 2025).
Entropy Model Overhead: For maximal RD efficiency, hierarchical quantization and entropy models must themselves be compressible, with their bit cost included in the global optimization (explicit in NVRC (Kwan et al., 2024)).
Task Generalization: Ensuring downstream task performance (e.g., analytics) or generalization to unseen data remains challenging. Codebook-based hyperpriors and joint-task optimization provide promising results but require further theory (Hu et al., 2021).
Memory and Hardware Constraints: While decoding is often fast for representation-based codecs, real-world deployments on edge devices or massive-scale databases necessitate further reductions in model size and computational footprint (Zhang et al., 2024).
Theoretical Understanding: While empirical rate–distortion curves are favorable, a unified theory linking representation expressivity, quantization geometry, entropy coding, and downstream utility is nascent.

Research is likely to intensify in content-adaptive representations, joint optimization of representation and compression pipeline, and hybrid explicit-implicit architectures. The cross-fertilization of ideas from analytics, computer vision, and systems engineering will continue to shape the field.

7. Representative Recent Advances and Comparative Table

The table below highlights distinctive recent contributions across modalities:

Approach/Domain	Representation	Compression Mechanism	Key Results / Impact	Reference
HiNeRV / Video	Hier. INR	Multi-grid, sparse quant., entropy coding	72.3% bitrate saving over HNeRV, 43.4% over DCVC	(Kwan et al., 2023)
GaussianImage++ / Image	2D Gaussians	Densification, CAF, LSQ+	Outperforms COIN, >1 dB gain, >1500 FPS decode	(Li et al., 22 Dec 2025)
NVRC / Video	Hier. INR	Three-tier quant., context entropy coding	24% BDBR gain over VVC-VTM	(Kwan et al., 2024)
CoRe / NLP Embedding	Embedding matrix	Recursive SVD	10–50× reduction with negl. F1 loss, sometimes ↑ F1	(Škrlj et al., 2021)
SANR / Light Field	Scene-aware INR	Hierarchical scene code, QAT	–65.6% BDBR vs. HEVC, +2.47dB BD-PSNR	(Zhang et al., 17 Oct 2025)
Patch-based Relaxation	Patch collection	Matrix clustering/compression	≤5% patch retention, minimal convergence change	(Harper et al., 2023)
Representation Shift / Token	Token matrix	ΔMLP vector/pruning	Up to 5.5× speedup in ViT/UMT, agnostic to arch.	(Choi et al., 1 Aug 2025)

In conclusion, representation-based compression has become a dominant paradigm across diverse data modalities, leveraging advances in neural modeling, quantization, entropy modeling, and adaptive representation learning to achieve state-of-the-art rate–distortion and efficiency benchmarks. The field continues to deepen integration with task-oriented pipelines and to improve scalability, generalization, and comprehension of representation–compression tradeoffs.