Papers
Topics
Authors
Recent
Search
2000 character limit reached

Implicit Compression: Function-Based Encoding

Updated 14 April 2026
  • Implicit Compression is a data compression method relying on implicit neural representations, where a compact neural function is overfitted to continuous data samples.
  • It optimizes a rate–distortion objective by learning per-sample network parameters that are quantized and entropy-coded, ensuring efficient storage and precise reconstruction.
  • Applications span images, video, point clouds, and scientific data, demonstrating advantages in spatial adaptivity, frequency control, and versatile encoding.

Implicit compression is a method of data compression in which signals (such as images, video, volumes, or point clouds) are represented not by storing sampled values or transform coefficients, but by overfitting a compact, continuous neural function—an implicit neural representation (INR)—to the data. The compressed bitstream consists of the learned parameters of this network, potentially after quantization, entropy coding, or further model compression. Decoding then reconstructs the data by evaluating the network on desired coordinates. This paradigm has emerged as a highly general, function-based alternative to classical and latent-space codecs for a wide range of modalities, offering unique advantages in representation flexibility, spatial adaptivity, and direct rate–distortion control (Yang et al., 2022, Strümpler et al., 2021, Schwarz et al., 2023).

1. Mathematical Formulation of Implicit Compression

A canonical INR is a parameterized function fθ:RMRCf_\theta:\mathbb{R}^M\to\mathbb{R}^C mapping continuous coordinates xx to signal values yy. In implicit compression, the signal d(x)d(x) (e.g., image intensities, voxel occupancy, RGB colors) is approximated as fθ(x)d(x)f_\theta(x)\approx d(x) over the domain of interest (pixels, voxels, spatial/temporal/spectral coordinates) (Yang et al., 2022, Dupont et al., 2021, Cho et al., 2 Jun 2025, Dai et al., 2024). For multi-dimensional data (images, videos, point clouds), xx can encompass spatial, angular, temporal, band, or view indices.

The compression process is thus reduced to learning “per-sample” the network parameters θ\theta^* that minimize a distortion metric D(θ)=Exfθ(x)d(x)2D(\theta)=\mathbb{E}_x\|f_\theta(x)-d(x)\|^2. Once optimized, these weights are quantized (q(θ)q(\theta)), perhaps sparsified or pruned, and entropy-coded (using learned, empirical, or parametric priors). The compressed bitstream consists mainly of the coded θ\theta, plus minor metadata (Strümpler et al., 2021, Damodaran et al., 2023, Cho et al., 2 Jun 2025, Wang et al., 2024).

The fundamental rate–distortion objective is

xx0

where xx1 denotes the total coded bit-length and xx2 trades off reconstruction distortion against bitrate (Strümpler et al., 2021, Damodaran et al., 2023, Cho et al., 2 Jun 2025).

2. Theoretical underpinnings and Spectral Properties

Implicit neural compression leverages specific inductive biases and spectral characteristics of the neural architectures. Shallow MLPs with sinusoidal activation (e.g., SIREN) embedded with Fourier features tend to concentrate approximation power around frequencies represented by their input encoding and hidden-layer width, leading to a “spectrum concentration” effect. This property strongly governs parameter efficiency and reconstruction fidelity for data with broadband or multiscale content (Yang et al., 2022).

For a given function xx3 with a target spectrum, the number of INR parameters, the frequency basis of the encoding, and the depth of the network determine how well high-frequency details are preserved. A key finding is that most spectral energy in shallow sinusoidal networks is localized around base frequencies, with amplitude in higher-order harmonics suppressed by the Bessel function decay in the Jacobi–Anger expansion. Thus, signals with broad or spatially-varying spectrum benefit from block-wise partitioning and local models (Yang et al., 2022, Yang et al., 2022).

3. Model Architectures and Compression Workflows

The INR used for implicit compression is usually a compact MLP with periodic (sinusoidal, cosine), ReLU, or learnable activations, often applied after a Fourier feature positional encoding for high-frequency fidelity (Dupont et al., 2021, Yang et al., 2022, Zhang et al., 20 Apr 2025). Notable architectural augmentations include

  • Low-rank and adaptive basis factorization for reducing model size while retaining expressivity (e.g., ImpliSat framework uses low-rank modulated SIREN MLPs for multispectral image compression) (Cho et al., 2 Jun 2025).
  • Hierarchical parameter sharing, as in Tree-structured Implicit Neural Compression (TINC), which encodes both local and non-local redundancy by sharing early layers and only specializing late layers or additive blocks for high-frequency or region-specific details (Yang et al., 2022).
  • Block-wise partitioning, spectrum-based splitting (e.g., SCI), or patch-based optimization, allocating network capacity according to local frequency content or spatial statistics (Yang et al., 2022).
  • Learnable and adaptive activation functions in latent space, such as LeAFNet for point clouds, allowing fine spectral adaptation while maintaining a compact parameterization (Zhang et al., 20 Apr 2025).

The generic workflow is:

  1. Fit an INR xx4 to the target data sample;
  2. Quantize and entropy-code the weights xx5 (with bit-precise, possibly layer-wise quantization, entropy coding, or variational models);
  3. Transmit the resulting bitstream; and
  4. Decode by reconstructing the signal as xx6 on the target grid.

In video and dynamic point-cloud compression, spatio-temporal or view-time coordinates drive the INR, with temporal redundancy handled either via sequential parameterization, residuals, or global 4D embedding (Gao et al., 25 Mar 2025, Ruan et al., 2024). For multi-spectral images, dynamic Fourier modulation and hypernetworks adapt the parameterization per band and resolution (Cho et al., 2 Jun 2025).

4. Quantization, Entropy Coding, and Rate–Distortion Control

Compressed INRs are made bit-efficient via aggressive quantization (e.g., down to 7–8 bit precision with minimal degradation using quantization-aware training or AdaRound) (Strümpler et al., 2021, Damodaran et al., 2023), border-aware and empirical entropy coding of quantized weights (Damodaran et al., 2023, Strümpler et al., 2021), and learned priors on parameters (Gaussian, variational, data-driven) (Guo et al., 2023, Schwarz et al., 2023).

Some frameworks perform meta-learning (e.g., with MAML) to find a global initialization for quick per-sample fitting and to minimize the amplitude and entropy of weight updates, which facilitates better quantization and entropy coding (Strümpler et al., 2021). Bayesian approaches fit variational posteriors to parameters and perform entropy coding using relative-entropy codes, setting rate precisely as the KL divergence to the prior (xx7) (Guo et al., 2023).

The overall rate–distortion optimization is sometimes convexified and controlled directly during fitting by Lagrange multipliers or alternating quantization-aware and distortion-reducing gradient steps (Dupont et al., 2021, Damodaran et al., 2023, Cho et al., 2 Jun 2025).

5. Applications and Empirical Results Across Modalities

Images, Light Fields, and Volumetric Data

  • Simple INR codecs (COIN, RQAT-INR) can outpace JPEG and approach learned autoencoders at low bitrates, with competitive PSNR and lower decoder complexity (Dupont et al., 2021, Damodaran et al., 2023).
  • Meta-learned and quantization-aware INRs can close the gap to JPEG2000 and BPG at mid rates; INRs trained with entropy coding and advanced architecture outperform block coders on certain benchmarks (Strümpler et al., 2021, Damodaran et al., 2023).
  • Light-field INR compression encodes SAIs as functions of (u,v), storing angular redundancy implicitly, and achieves comparable or superior perceptual quality versus classical approaches (Wang et al., 2024).

Biomedical, Climate, and Scientific Data

  • Spectrum-concentrated and tree-based INRs (SCI, TINC) achieve state-of-the-art performance on medical volumes, neuronal structures, and atmospheric fields, outperforming HEVC and tailored codecs at high compression ratios (up to 512–1024×) (Yang et al., 2022, Yang et al., 2022, Xu et al., 2024).
  • Application-specific guidance (segmentation, perceptual, or task loss) can be seamlessly incorporated during fitting to prioritize critical features in microscopy or scientific imaging (Dai et al., 2024).

Point Cloud and Dynamic Scene Compression

  • Per-instance INR fitting for geometry and color fields, with quantization and adaptive activations, surpasses octree and G-PCC standards for both static and dynamic point clouds, with up to +7 dB PSNR improvement and up to 89% bit-rate reduction in dynamic settings (Zhang et al., 20 Apr 2025, Ruan et al., 2024).

Video and Multi-View

  • Implicit Neural Video Compression (IPF, GIViC) leverages coordinate-mapped frame prediction or full-GOP (Group of Pictures) functional fitting, outperforming MPEG-4 at low rates and, with generative hierarchical models, surpassing VTM and state-of-the-art neural codecs on benchmark sets (Gao et al., 25 Mar 2025, Zhang et al., 2021).
  • Multi-view integration approaches blend implicit and explicit coding, using INR-based codecs to encode all but one anchor view, enabling state-of-the-art scene modeling and improved rate-distortion over MIV (Zhu et al., 2023).

Advanced: Quantum and Modality-Agnostic Compression

  • Quantum neural architectures (quINR) improve rate-distortion by leveraging exponential Hilbert space capacity, delivering up to +1.2 dB PSNR gain over classical MLP INR on images at a given bit rate (Fujihashi et al., 2024).
  • Modality-agnostic variational INR compression (VC-INR) yields state-of-the-art performance across images, audio, climate, and 3D scenes, with architecture-agnostic soft-subnetwork gating and learned entropy models for the latent space (Schwarz et al., 2023).

6. Limitations, Generalization, and Extensions

The principal limitations of implicit compression are encoder-side computational cost (per-sample fitting generally requires thousands of gradient steps), challenges in capturing high-frequency content with small models (spectral bandwidth bottleneck), and the lack of fast, amortized inference for deployment at scale (Dupont et al., 2021, Strümpler et al., 2021, Damodaran et al., 2023, Xu et al., 2024). Overfitting-based methods may blur high-frequency, nonstationary structures unless model size or block count is increased, or spectrum-based partitioning is employed (Yang et al., 2022).

Recent approaches address these by:

  • Incorporating tree- or spectrum-structured block division with hierarchical parameter sharing (Yang et al., 2022, Yang et al., 2022).
  • Utilizing learned optimizers and meta-initializations to reduce fitting time (Dai et al., 2024, Strümpler et al., 2021).
  • Hybridizing explicit codecs for the dominant portions of data and using INR-based residual coding or adapters for the error, accelerating fitting without sacrificing quality (Dai et al., 2024, Zhu et al., 2023).
  • Employing probabilistic and variational compression methods (e.g., variational Bayesian INRs, relative-entropy coding) that provide finer rate-distortion control, improved entropy modeling, and modality-agnostic applicability (Guo et al., 2023, Schwarz et al., 2023).

Extensions under active study include neural architecture search for function parameterization, adaptively-learned quantization or pruning schedules, block-wise fitting for massive scenes, hybrid quantum-classical architectures, and expansion to new modalities including hyperspectral, scientific, environmental, and multi-modal data (Fujihashi et al., 2024, Cho et al., 2 Jun 2025, Xu et al., 2024).

7. Summary Table of Recent Implicit Compression Approaches

Model/Framework Domain Key Architectural Features Rate-Distortion/Empirical Outcomes Reference
COIN, RQAT-INR Images MLP+Fourier features, QAT, AdaRound Outperforms JPEG/JPEG2000 at low bpp (Dupont et al., 2021, Damodaran et al., 2023)
TINC Volumes Tree-based, shared MLP layers +1–2 dB PSNR vs. SOTA at 512–1024× compression (Yang et al., 2022)
SCI Biomedical Spectrum-adaptive partition, funnel MLP State-of-the-art on CT/sparse data, PSNR, Acc (Yang et al., 2022)
GIViC, IPF Video INR+diffusion/transformer, coordinate flow Outperforms VVC/VTM, MPEG-4 at low/RA bitrate (Gao et al., 25 Mar 2025, Zhang et al., 2021)
ImpliSat Multispectral Fourier modulation, per-band adaptation 10× compression at PSNR=36 dB (Sentinel-2) (Cho et al., 2 Jun 2025)
PICO, NeRC³ Point Clouds Learnable activation, joint geometry/color +4–7 dB vs. G-PCC, up to 89% BD-BR reduction (Zhang et al., 20 Apr 2025, Ruan et al., 2024)
INIF, HiHa Microscopy, Atmos Multi-loss INR, harmonic decomposition Up to 512× compression, 20–40× compression for 1e-3 RMSE (Dai et al., 2024, Xu et al., 2024)
VC-INR, COMBINER Multi-modal Variational quantization, gating, Bayes prior Outperforms JPEG2000, MP3, HEVC cross-modality (Schwarz et al., 2023, Guo et al., 2023)

Implicit compression thus constitutes a unifying and highly extensible framework for modality-agnostic, function-based compression, capable of rivaling or surpassing expert-tuned codecs in diverse domains through advances in rate-distortion optimization, network architecture, and entropy modeling.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Implicit Compression.