Amortised MAP Inference for Image Super-resolution (1610.04490v3)

Published 14 Oct 2016 in cs.CV, cs.LG, and stat.ML

Abstract: Image super-resolution (SR) is an underdetermined inverse problem, where a large number of plausible high-resolution images can explain the same downsampled image. Most current single image SR methods use empirical risk minimisation, often with a pixel-wise mean squared error (MSE) loss. However, the outputs from such methods tend to be blurry, over-smoothed and generally appear implausible. A more desirable approach would employ Maximum a Posteriori (MAP) inference, preferring solutions that always have a high probability under the image prior, and thus appear more plausible. Direct MAP estimation for SR is non-trivial, as it requires us to build a model for the image prior from samples. Furthermore, MAP inference is often performed via optimisation-based iterative algorithms which don't compare well with the efficiency of neural-network-based alternatives. Here we introduce new methods for amortised MAP inference whereby we calculate the MAP estimate directly using a convolutional neural network. We first introduce a novel neural network architecture that performs a projection to the affine subspace of valid SR solutions ensuring that the high resolution output of the network is always consistent with the low resolution input. We show that, using this architecture, the amortised MAP inference problem reduces to minimising the cross-entropy between two distributions, similar to training generative models. We propose three methods to solve this optimisation problem: (1) Generative Adversarial Networks (GAN) (2) denoiser-guided SR which backpropagates gradient-estimates from denoising to train the network, and (3) a baseline method using a maximum-likelihood-trained image prior. Our experiments show that the GAN based approach performs best on real image data. Lastly, we establish a connection between GANs and amortised variational inference as in e.g. variational autoencoders.

Citations (424)

View on Semantic Scholar

Summary

The paper demonstrates that amortised MAP inference via CNNs yields high-resolution images with enhanced perceptual quality compared to conventional MSE methods.
It introduces a novel framework combining GANs, denoiser-guided, and density-based approaches to minimize cross-entropy with natural image priors.
The approach shows promising applications in medical imaging and digital media, paving the way for advances in realistic image reconstruction.

Amortised MAP Inference for Image Super-Resolution

The paper under examination introduces a novel approach to the challenging problem of image super-resolution, a critical inverse problem in computer vision where the task is to construct high-resolution images from their low-resolution counterparts. Traditional methods often rely on empirical risk minimization with pixel-wise mean squared error (MSE) loss, which can lead to blurry and visually implausible outputs due to the averaging effect on possible solution distributions. The paper shifts focus towards Maximum a Posteriori (MAP) inference, seeking to deliver solutions with high posterior probabilities under an image prior, thereby enhancing the perceptual plausibility of the results.

Methodology

The researchers propose an innovative method for performing amortised MAP inference via convolutional neural networks (CNN), introducing an architecture that projects into the affine subspace of valid super-resolution (SR) solutions. This ensures consistency between the low-resolution input and the high-resolution output. The optimization problem is reframed into minimizing the cross-entropy between the model's output distribution and the prior distribution of high-resolution images, akin to certain training methods in generative models.

Three distinct approaches are developed to achieve this minimization:

Generative Adversarial Networks (GANs): The paper proposes a variant where the GAN framework, traditionally used for generating realistic images, minimizes the Kullback-Leibler (KL) divergence between the output and the prior image distribution, effectively aligning generated images with the probable images under the natural image distribution.
Denoiser-Guided Super-Resolution: This method uses denoising capabilities to estimate gradients of log-probabilities, subsequently backpropagating these estimates to train the model. The technique treats denoising as a means to capture natural image statistics, capitalizing on their property of approximating gradient steps along log-probabilities.
Density-Guided SR: Here, a tractable density model is used to directly model the image prior, employing a likelihood-based approach to train the SR network.

Results and Discussion

The experiments demonstrate the efficacy of the proposed methods, particularly highlighting the superior performance of the GAN approach. It is notable that the AffGAN model not only outputs sharper images, but also produces results that are qualitatively better aligned with human perception compared to traditional MSE-trained models. The numerical results further showcase that the AffGAN approach achieves cross-entropy values closer to an ideal MAP solution when validated against both simplistic datasets such as the 2D Swiss Roll and more complex datasets like natural images from the Celeb-A and ImageNet collections.

Implications and Future Directions

The implications of this research extend significantly into practical applications where high-quality image reconstruction is paramount. Potential domains include medical imaging, where precise and realistic reconstructions can enhance diagnostic capabilities, and digital media, where visual realism is critical.

Future perspectives may entail exploring the field of stochastic sampling in GAN frameworks to further diversify and solidify generated outputs, as well as conducting thorough investigations into variational inference parallels arising from the proposed architectures. The alignment of GAN-based methods within a theoretical framework of variational inference could potentially yield optimized algorithms for a broader range of generative tasks beyond super-resolution, capitalizing on the interplay between adversarial training and probabilistic model inference.

In summary, this paper presents a well-founded research pathway addressing critical limitations in current super-resolution techniques, with the potential to significantly enhance the theoretical and applied spectrum of high-resolution image formation.

PDF Markdown