Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder (2403.10255v1)
Abstract: Super-resolution (SR) and image generation are important tasks in computer vision and are widely adopted in real-world applications. Most existing methods, however, generate images only at fixed-scale magnification and suffer from over-smoothing and artifacts. Additionally, they do not offer enough diversity of output images nor image consistency at different scales. Most relevant work applied Implicit Neural Representation (INR) to the denoising diffusion model to obtain continuous-resolution yet diverse and high-quality SR results. Since this model operates in the image space, the larger the resolution of image is produced, the more memory and inference time is required, and it also does not maintain scale-specific consistency. We propose a novel pipeline that can super-resolve an input image or generate from a random noise a novel image at arbitrary scales. The method consists of a pretrained auto-encoder, a latent diffusion model, and an implicit neural decoder, and their learning strategies. The proposed method adopts diffusion processes in a latent space, thus efficient, yet aligned with output image space decoded by MLPs at arbitrary scales. More specifically, our arbitrary-scale decoder is designed by the symmetric decoder w/o up-scaling from the pretrained auto-encoder, and Local Implicit Image Function (LIIF) in series. The latent diffusion process is learnt by the denoising and the alignment losses jointly. Errors in output images are backpropagated via the fixed decoder, improving the quality of output images. In the extensive experiments using multiple public benchmarks on the two tasks i.e. image super-resolution and novel image generation at arbitrary scales, the proposed method outperforms relevant methods in metrics of image quality, diversity and scale consistency. It is significantly better than the relevant prior-art in the inference speed and memory usage.
- Image generators with conditionally-independent pixel synthesis. In CVPR, pages 14278–14287, 2021.
- Multidiffusion: Fusing diffusion paths for controlled image generation. arXiv preprint arXiv:2302.08113, 2023.
- The perception-distortion tradeoff. In CVPR, 2018.
- iedit: Localised text-guided image editing with weak supervision. arXiv:2305.05947, 2023.
- Prompt augmentation for self-supervised text-guided image manipulation. In CVPR, 2024.
- Glean: Generative latent bank for large-factor image super-resolution. In CVPR, pages 14245–14254, 2021.
- Learning continuous image representation with local implicit image function. In CVPR, pages 8628–8638, 2021.
- Toward spatially unbiased generative models. In ICCV, pages 14253–14262, 2021.
- Implicit diffusion models for continuous super-resolution. In CVPR, pages 10021–10030, 2023.
- Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, pages 6840–6851. Curran Associates, Inc., 2020.
- Cascaded diffusion models for high fidelity image generation. arXiv preprint arXiv:2106.15282, 2021.
- Meta-sr: A magnification-arbitrary network for super-resolution. In CVPR, 2019.
- Msg-gan: Multi-scale gradients for generative adversarial networks. In CVPR, 2020.
- Progressive growing of gans for improved quality, stability, and variation. ICLR, 2018.
- A style-based generator architecture for generative adversarial networks. In CVPR, 2019.
- Diffusionclip: Text-guided diffusion models for robust image manipulation. In CVPR, pages 2426–2435, 2022.
- Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, pages 4681–4690, 2017.
- Local texture estimator for implicit representation function. In CVPR, pages 1929–1938, 2022.
- Interhandgen: Two-hand interaction generation via cascaded reverse diffusion. In CVPR, 2024.
- Swinir: Image restoration using swin transformer. In ICCVW, pages 1833–1844, 2021.
- Enhanced deep residual networks for single image super-resolution. In CVPRW, 2017.
- Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In CVPR, 2020.
- Arbitrary-scale image synthesis. In CVPR, pages 11533–11542, 2022.
- High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241, 2015.
- Pyramidal denoising diffusion probabilistic models. arXiv preprint arXiv:2208.01864, 2022.
- Image super-resolution via iterative refinement. arXiv:2104.07636, 2021.
- Progressive distillation for fast sampling of diffusion models. In ICLR, 2022.
- Adversarial generation of continuous images. In CVPR, pages 10753–10764, 2021.
- Denoising diffusion implicit models. arXiv:2010.02502, 2020.
- Lldiffusion: Learning degradation representations in diffusion models for low-light image enhancement. arXiv:2307.14659, 2023a.
- Esrgan: Enhanced super-resolution generative adversarial networks. In ECCVW, 2018.
- Unlimited-size diffusion restoration. In CVPRW, pages 1160–1167, 2023b.
- Positional encoding as spatial inductive bias in gans. In CVPR, pages 13569–13578, 2021.
- Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization. arXiv preprint arXiv:2308.14469, 2023.
- Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018a.
- Image super-resolution using very deep residual channel attention networks. In ECCV, 2018b.
- Yochai Blau and Tomer Michaeli. The perception-distortion tradeoff. In CVPR, 2018.