Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Variational image compression with a scale hyperprior (1802.01436v2)

Published 1 Feb 2018 in eess.IV, cs.IT, and math.IT

Abstract: We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics.

Variational Image Compression with a Scale Hyperprior

The presented paper introduces an advanced methodology for variational image compression, leveraging the framework of variational autoencoders (VAEs). This approach stands out by incorporating a hyperprior to capture the spatial dependencies in the latent representations, which is a significant enhancement over traditional methods where such dependencies are typically hand-engineered.

The authors propose a novel end-to-end trainable neural network-based model, which notably includes a sophisticated entropy model. The enhanced model uses a hyperprior to characterize side information in image compression, a concept traditionally underutilized in artificial neural network (ANN) based compression techniques.

Key Contributions:

  1. End-to-End Trainable Model: The paper proposes a VAE-based model where both the autoencoder and the entropy model (including the hyperprior) are jointly trained. This enables optimization for overall compression efficiency directly through backpropagation.
  2. Hyperprior Integration: Unlike earlier approaches that rely on hand-engineered structures for side information, this model learns latent space dependencies directly. The hyperprior effectively captures the scales of neighboring elements in the latent space, reducing the mismatch between the prior and the actual distribution.
  3. Superior Rate-Distortion Performance: The proposed model demonstrates superior performance in image compression benchmarks, particularly when evaluated using the MS-SSIM and PSNR metrics. For instance, the model shows a substantial reduction in bit rate for similar levels of distortion when compared to both conventional codecs (e.g., JPEG, BPG) and contemporary ANN-based methods.

Technical Insights:

The model's architecture extends the typical autoencoder by including an analysis transform (gag_a), a synthesis transform (gsg_s), and introducing additional parametric transforms (hah_a, hsh_s) to model the hyperprior. This structure is depicted in detail along with the operational flow in the provided diagrams. The entropy model's reliance on a non-parametric density model convolved with a uniform density allows it to closely approximate the true marginal distribution of the latent variables.

During training, the quantizer is approximated by adding uniform noise, facilitating gradient descent optimizations. The use of the KL divergence for the loss function ensures that the learned distributions of the latent codes are close to the actual distributions, optimizing both the rate (bit rate) and distortion (quality loss) simultaneously.

Experimental Validation:

Extensive experiments validate the model's efficiency across different metrics and datasets. The results indicate:

  • PSNR Performance: When trained for mean squared error (MSE), the hyperprior model nearly approaches the performance of HEVC, known for its efficient compression.
  • MS-SSIM Performance: For perceptual quality benchmarks, this model outperforms state-of-the-art ANN-based methods, particularly at higher bit rates.

Additionally, the model efficiently balances the inclusion of side information, contributing less than 0.1 bpp on average, thus ensuring that the compression efficiency gains outweigh the extra bits used for side information transmission.

Practical Implications and Future Directions:

The introduction of a hyperprior provides a new paradigm for enhancing image compression via ANNs, moving closer to the performance of traditional highly optimized methods like HEVC but with the flexibility and adaptability of ANN-based approaches. The model's ability to be optimized for different distortion metrics makes it versatile for various applications, whether focusing on visual quality or numerical fidelity.

Future work can further explore more complex dependencies between elements in the latent space, potentially introducing hierarchical or multi-scale hyperpriors. Additionally, integrating advanced perceptual metrics directly into the loss functions could yield even more visually coherent reconstructions.

This research lays a robust foundation for future developments in ANN-based image compression, highlighting the importance of trainable entropy models and the potential of learned side information to achieve optimal rate-distortion performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Johannes Ballé (29 papers)
  2. David Minnen (19 papers)
  3. Saurabh Singh (95 papers)
  4. Sung Jin Hwang (10 papers)
  5. Nick Johnston (17 papers)
Citations (1,565)