GaussianImage++: Adaptive 2D Image Compression

Updated 29 December 2025

GaussianImage++ is an advanced image compression framework that represents images as sets of adaptive 2D Gaussian primitives, optimized for spatial compactness and rapid rendering.
It employs distortion-driven densification and context-aware filtering to allocate and refine primitives based on local reconstruction errors.
The framework integrates differentiable rendering and quantization-aware compression techniques, ensuring high-fidelity results at ultra-high resolutions with real-time performance.

GaussianImage++ refers to a class of advanced image representation and compression frameworks that encode images as sets of adaptive, content-aware 2D Gaussian primitives, optimized for both spatial compactness and rapid rendering. These methods extend the principles introduced by GaussianImage, combining distortion-driven primitive allocation, context-aware filtering, quantization-aware compression, and efficient differentiable rendering to yield high-fidelity, hardware-efficient image codecs suitable for ultra-high-resolution and real-time applications (Li et al., 22 Dec 2025, Zhang et al., 13 Mar 2024, Zhang et al., 2 Jul 2024).

1. Mathematical Basis and Representation Model

GaussianImage++ represents a target image $X(x)$ as a weighted sum of $N$ 2D anisotropic Gaussian primitives:

$\hat{X}(x) = \sum_{i=1}^N w_i\,c_i \exp\left(-\frac{1}{2}(x-\mu_i)^{\top}\Sigma_i^{-1}(x-\mu_i)\right)$

where for each primitive $i$ :

$\mu_i \in \mathbb{R}^2$ is the 2D center,
$\Sigma_i \in \mathbb{R}^{2\times2}$ is a positive semi-definite covariance encoding scale and orientation, often parameterized as $\Sigma_i = R(\theta_i) S_i^2 R(\theta_i)^{\top}$ ,
$w_i$ is a scalar contribution (or amplitude, in some formulations $w_i$ is absorbed into $c_i$ ),
$c_i \in \mathbb{R}^3$ is the RGB color,
Optionally, additional attributes such as per-primitive opacity or filter variance are included (Li et al., 22 Dec 2025, Zhang et al., 2 Jul 2024, Zhang et al., 13 Mar 2024, Wang et al., 14 Dec 2025).

Rendering at each pixel can proceed by direct weighted summation, order-agnostic $\alpha$ -blending, or normalized mixture for color integration, without the necessity of depth ordering (Zhang et al., 13 Mar 2024, Zhang et al., 2 Jul 2024, Li et al., 23 Dec 2025).

2. Primitive Allocation and Densification

A central innovation in GaussianImage++ is distortion-driven densification (D³) (Li et al., 22 Dec 2025). Rather than allocating primitives uniformly, the algorithm identifies regions where the current reconstruction error $D(x)$ is highest, inserting new Gaussians at those coordinates up to a fixed budget $M$ :

Compute per-pixel distortion $D(x) = |X(x) - \hat{X}(x)|$ .
Select $k = \lceil (M-N)/2 \rceil$ coordinates with highest $D(x)$ as new primitive centers.
Initialize attributes (covariance, color) to local image statistics or learned priors.

This adaptive placement is complemented by dynamic merge/prune steps: primitives close in position/covariance or exhibiting negligible contribution are merged or removed, yielding content-aware spatial sparsity (Zhang et al., 2 Jul 2024).

Further, exclusion-based uniform sampling is used in tandem with importance-based sampling to guarantee robust coverage and avoid clustering, especially in ultra-high-resolution or high-compression scenarios (Li et al., 23 Dec 2025).

3. Context- and Content-Aware Filtering

To mitigate under-coverage in sparse settings and accelerate convergence, each Gaussian’s effective support is adaptively controlled by a context-aware filter variance $s_i$ (CAF) (Li et al., 22 Dec 2025). At growth steps, new Gaussians are initialized with large $s_i$ (yielding broad support), which shrinks as the number of primitives grows:

$s_i = HW/(\alpha N_t)$ for $i$ newly added; keep $s_{i,prev}$ otherwise, where $H\times W$ is image size and $N_t$ the current count.

GaussianImage++ also employs gradient–color guided variational sampling or spatial priors (via neural nets) to concentrate primitives on high-frequency, high-variance, or semantically relevant regions (Li et al., 23 Dec 2025, Wang et al., 14 Dec 2025).

4. Differentiable Rendering and Optimization

Rendering is fully differentiable with respect to all primitive parameters, allowing end-to-end optimization using pixel-wise losses. The primary objective is a combination of $L_1$ / $L_2$ and structural similarity (SSIM), sometimes augmented by regularization on covariance and weights:

$L(\Theta) = \lambda\,\| \hat{X}(x; \Theta) - X(x) \|_1 + (1-\lambda)(1-\mathrm{SSIM}(\hat{X}, X))$

where $\Theta=\{\mu_i, \Sigma_i, c_i\}_{i=1}^N$ .

Optimization is performed via Adam or similar first-order methods. Joint multi-attribute refinement over all parameters allows the system to capture both sharp local features and broad global structure efficiently (Zhang et al., 2 Jul 2024, Li et al., 23 Dec 2025).

To enable practical inference, random access decoding is indexed via spatial acceleration structures—typically a hierarchical BSP tree—restricting per-pixel computation to the $K$ most relevant Gaussians, with per-pixel operation counts on the order of $O(10^2)$ multiply-accumulates (Zhang et al., 2 Jul 2024). Integration across pixel areas is generally approximated by point-sampling at centers due to the smoothness of Gaussians at typical filter scales.

5. Compression Pipeline and Quantization

GaussianImage++ achieves compression by quantizing each set of primitive parameters using attribute-wise learnable scalar quantizers (LSQ+) (Li et al., 22 Dec 2025):

Positions: 12 bits per coordinate,
Covariances: 10 bits per matrix entry,
Colors: 6–8 bits per channel.

Quantizer scale/offsets are trainable and incorporated into quantization-aware fine-tuning. During encoding, quantized codes are packed directly (or with optional entropy coding), producing explicit, byte-aligned bitstreams.

Some variants employ additional vector or residual quantization for color (e.g., two-stage RVQ), and bits-back coding to minimize storage for large primitive sets (Zhang et al., 13 Mar 2024). Decoding entails direct readout and rendering, with typical performance exceeding $1,000$ FPS on commodity GPUs.

6. Empirical Results and Performance

Extensive evaluation on benchmarks such as Kodak, DIV2K, DIV8K, and 16K datasets demonstrates that GaussianImage++ consistently outperforms earlier GaussianImage and neural INR codecs (COIN, COIN++) in both rate–distortion (PSNR, MS-SSIM) and decoding efficiency (Li et al., 22 Dec 2025, Zhang et al., 13 Mar 2024, Li et al., 23 Dec 2025).

Representative results: | Method | PSNR (Kodak, 10k Gaussians) | FPS (GPU) | Memory (Primitives) | |------------------------|-----------------------------|-----------|---------------------| | GaussianImage | 32.48 dB | 2,000 | $\sim$ 0.08 M | | GaussianImage++ | 35.41 dB | 2,000 | $\sim$ 0.08 M | | COIN | 25.80 dB (DIV2K, 0.34 bpp) | 166 | (INR, large) | | GaussianImage++ (QAT) | 25.66 dB (DIV2K, 0.32 bpp) | 942 | (low) |

A plausible implication is that the distortion-driven densification and content-aware filtering account for $\sim2$ –$4$ dB improvements over prior Gaussian splatting methods for equal bitrates and primitive counts (Li et al., 22 Dec 2025). Furthermore, explicit memory usage and decoding cost per frame remain stable with increased resolution.

SmartSplat extends GaussianImage++ methodology to extreme regimes (e.g., 16K images, compression ratios up to $3,000\times$ ), utilizing gradient–color statistics and exclusion-driven spatial allocation to maintain fidelity where other methods run out-of-memory or collapse (Li et al., 23 Dec 2025).

7. Extensions, Applications, and Limitations

GaussianImage++ forms the core of a rapidly expanding ecosystem:

SmartSplat demonstrates feature-guided primitive placement and scale-adaptive color priors, suggesting plugins for semantic- or texture-aware extensions (Li et al., 23 Dec 2025).
Instant GaussianImage and Fast-2DGS employ neural position/attribute prior networks for real-time, adaptive Gaussian allocation, especially beneficial for batch or streaming scenarios (Zeng et al., 30 Jun 2025, Wang et al., 14 Dec 2025).
Editable and physics-aware regimes (e.g., MiraGe) embed GaussianImage++ representations into 3D or animation workflows via direct mapping to parametric triangle soups and flat Gaussians (Waczyńska et al., 2 Oct 2024).
Hardware efficiency is enhanced via blockwise spatial search, SIMD-optimized kernels, and low-bit attribute encodings, making deployment feasible on both high-end GPUs and memory-constrained mobile devices (Zhang et al., 2 Jul 2024).

Limitations include suboptimal PSNR at high bitrates relative to VAE-based codecs, lack of entropy-optimized bitstreams, and constrained support for temporal or layered compositions. Ongoing research explores integrating entropy models, multi-scale densification policies, and video/3D hybrids (Li et al., 22 Dec 2025, Li et al., 23 Dec 2025, Wang et al., 14 Dec 2025).

References:

GaussianImage++: Boosted Image Representation and Compression with 2D Gaussian Splatting (Li et al., 22 Dec 2025)
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting (Zhang et al., 13 Mar 2024)
Image-GS: Content-Adaptive Image Representation via 2D Gaussians (Zhang et al., 2 Jul 2024)
SmartSplat: Feature-Smart Gaussians for Scalable Compression of Ultra-High-Resolution Images (Li et al., 23 Dec 2025)
Fast 2DGS: Efficient Image Representation with Deep Gaussian Prior (Wang et al., 14 Dec 2025)
Instant GaussianImage: A Generalizable and Self-Adaptive Image Representation via 2D Gaussian Splatting (Zeng et al., 30 Jun 2025)
MiraGe: Editable 2D Images using Gaussian Splatting (Waczyńska et al., 2 Oct 2024)