Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 88 tok/s
Gemini 2.5 Pro 35 tok/s Pro
GPT-5 Medium 35 tok/s
GPT-5 High 28 tok/s Pro
GPT-4o 93 tok/s
GPT OSS 120B 474 tok/s Pro
Kimi K2 197 tok/s Pro
2000 character limit reached

Text-guided Explorable Image Super-resolution (2403.01124v1)

Published 2 Mar 2024 in cs.CV

Abstract: In this paper, we introduce the problem of zero-shot text-guided exploration of the solutions to open-domain image super-resolution. Our goal is to allow users to explore diverse, semantically accurate reconstructions that preserve data consistency with the low-resolution inputs for different large downsampling factors without explicitly training for these specific degradations. We propose two approaches for zero-shot text-guided super-resolution - i) modifying the generative process of text-to-image \textit{T2I} diffusion models to promote consistency with low-resolution inputs, and ii) incorporating language guidance into zero-shot diffusion-based restoration methods. We show that the proposed approaches result in diverse solutions that match the semantic meaning provided by the text prompt while preserving data consistency with the degraded inputs. We evaluate the proposed baselines for the task of extreme super-resolution and demonstrate advantages in terms of restoration quality, diversity, and explorability of solutions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (91)
  1. Nocaps: Novel object captioning at scale. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8948–8957, 2019.
  2. Explorable super resolution. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2716–2725, 2020.
  3. Universal guidance for diffusion models. arXiv preprint arXiv:2302.07121, 2023.
  4. Unleashing transformers: parallel token prediction with discrete absorbing diffusion for fast high-resolution image generation from vector-quantized codes. In 17th European Conference on Computer Vision, pages 170–188. Springer, 2022.
  5. Deepsee: Deep disentangled semantic explorative extreme super-resolution. In Proceedings of the Asian Conference on Computer Vision, 2020.
  6. Glean: Generative latent bank for large-factor image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14245–14254, 2021.
  7. A simple framework for contrastive learning of visual representations. ArXiv, abs/2002.05709, 2020.
  8. Low-resolution face recognition. In 14th Asian Conference on Computer Vision, pages 605–621. Springer, 2018.
  9. Ilvr: Conditioning method for denoising diffusion probabilistic models. arXiv preprint arXiv:2108.02938, 2021.
  10. Improving diffusion models for inverse problems using manifold constraints. In Advances in Neural Information Processing Systems, 2022a.
  11. Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12413–12422, 2022b.
  12. Diffusion posterior sampling for general noisy inverse problems. In International Conference on Learning Representations, 2023.
  13. Flexit: Towards flexible semantic image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18270–18279, 2022.
  14. Katherine Crowson. CLIP guided diffusion HQ 256x256. Colab Notebook, 2021.
  15. Vqgan-clip: Open domain image generation and editing with natural language guidance. In European Conference on Computer Vision, 2022.
  16. deepfloyd.ai. Deepfloyd if: A modular cascaded diffusion model, 2023. https://github.com/deep-floyd/IF.
  17. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. IEEE, 2009.
  18. Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems, 2021.
  19. CogView: Mastering text-to-image generation via transformers. Advances in Neural Information Processing Systems, 34, 2021.
  20. Imagebart: Bidirectional context with multinomial diffusion for autoregressive image synthesis. Advances in Neural Information Processing Systems, 34:3518–3532, 2021a.
  21. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12873–12883, 2021b.
  22. Make-a-scene: Scene-based text-to-image generation with human priors. In 17th European Conference on Computer Vision, pages 89–106. Springer, 2022.
  23. Generating images from caption and vice versa via clip-guided generative latent space search. arXiv preprint arXiv:2102.01645, 2021.
  24. Ntire 2022 challenge on perceptual image quality assessment. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 951–967, 2022a.
  25. Vector quantized diffusion model for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10696–10706, 2022b.
  26. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  27. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, pages 6840–6851. Curran Associates, Inc., 2020.
  28. Global context with discrete diffusion in vector quantised modelling for image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11502–11511, 2022.
  29. Robust compressed sensing mri with deep generative priors. In Advances in Neural Information Processing Systems, pages 14938–14954. Curran Associates, Inc., 2021.
  30. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, pages 4904–4916. PMLR, 2021.
  31. Investigating loss functions for extreme super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 424–425, 2020.
  32. Tackling the ill-posedness of super-resolution through adaptive target generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16236–16245, 2021a.
  33. Srflow-da: Super-resolution using normalizing flow with deep convolutional block. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 364–372, 2021b.
  34. Stochastic solutions for linear inverse problems using the prior implicit in a denoiser. In Advances in Neural Information Processing Systems, 2021.
  35. Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations, 2018.
  36. If at first you don’t succeed, try, try again: Faithful diffusion-based text-to-image generation by selection, 2023.
  37. SNIPS: Solving noisy inverse problems stochastically. In Advances in Neural Information Processing Systems, 2021.
  38. Denoising diffusion restoration models. In Advances in Neural Information Processing Systems, 2022.
  39. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017.
  40. Karlo-v1.0.alpha on coyo-100m and cc15m. https://github.com/kakaobrain/karlo, 2022.
  41. Controllable text-to-image generation. Advances in Neural Information Processing Systems, 32, 2019.
  42. Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 479:47–59, 2022.
  43. Microsoft coco: Common objects in context. In European Conference on Computer Vision, 2014.
  44. Fusedream: Training-free text-to-image generation with improved clip+ gan space optimization. arXiv preprint arXiv:2112.01573, 2021.
  45. More control for free! image synthesis with semantic diffusion guidance. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 289–299, 2023.
  46. Srflow: Learning the super-resolution space with normalizing flow. In Proceedings of 16th European Conference on Computer Vision (ECCV), pages 715–732. Springer, 2020.
  47. Ntire 2021 learning the super-resolution space challenge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 596–612, 2021.
  48. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022a.
  49. Ntire 2022 challenge on learning the super-resolution space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 786–797, 2022b.
  50. Rethinking super-resolution as text-guided details generation. arXiv preprint arXiv:2207.06604, 2022.
  51. Generating images from captions with attention. In Proceedings of the International Conference on Learning Representations (ICLR), 2016.
  52. A variational perspective on solving inverse problems with diffusion models. arXiv preprint arXiv:2305.04391, 2023.
  53. Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2437–2445, 2020.
  54. Making a “completely blind” image quality analyzer. IEEE Signal processing letters, 20(3):209–212, 2012.
  55. Exploring the solution space of linear inverse problems with gan latent geometry. In 2022 IEEE International Conference on Image Processing (ICIP), pages 1381–1385, 2022.
  56. Steered diffusion: A generalized framework for plug-and-play conditional image synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 20850–20860, 2023.
  57. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  58. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In Proceedings of the 39th International Conference on Machine Learning, pages 16784–16804. PMLR, 2022.
  59. No token left behind: Explainability-aided image classification and generation. In 17th European Conference on Computer Vision (ECCV), Berlin, Heidelberg, 2022. Springer-Verlag.
  60. Generating unobserved alternatives: A case study through super-resolution and decompression. OpenReview, 2020.
  61. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  62. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  63. Zero-shot text-to-image generation. In Proceedings of the 38th International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  64. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  65. Generative adversarial text to image synthesis. In Proceedings of The 33rd International Conference on Machine Learning, pages 1060–1069. PMLR, 2016.
  66. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  67. Solving linear inverse problems provably via posterior sampling with latent diffusion models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  68. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  69. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4713–4726, 2023.
  70. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. In NeurIPS Workshop Datacentric AI. Jülich Supercomputing Center, 2021.
  71. Laion-5b: An open large-scale dataset for training next generation image-text models. ArXiv, abs/2210.08402, 2022.
  72. Deep null space learning for inverse problems: convergence analysis and rates. Inverse Problems, 35, 2018.
  73. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021.
  74. Pseudoinverse-guided diffusion models for inverse problems. In International Conference on Learning Representations, 2023.
  75. Improved vector quantized diffusion models. arXiv preprint arXiv:2205.16007, 2022.
  76. The caltech-ucsd birds-200-2011 dataset, 2011.
  77. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning, pages 9929–9939. PMLR, 2020.
  78. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European conference on computer vision (ECCV) workshops, pages 0–0, 2018.
  79. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1905–1914, 2021.
  80. Panini-net: Gan prior based degradation-aware feature interpolation for face restoration. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2576–2584, 2022.
  81. Zero-shot image restoration using denoising diffusion null-space model. In International Conference on Learning Representations, 2023.
  82. Deblurring via stochastic refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16293–16303, 2022.
  83. AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1316–1324, 2018.
  84. Freedom: Training-free energy-guided conditional diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
  85. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 5908–5916, 2017.
  86. Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE transactions on pattern analysis and machine intelligence, 41(8):1947–1962, 2018.
  87. Cross-modal contrastive learning for text-to-image generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  88. Towards authentic face restoration with iterative diffusion models and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7312–7322, 2023.
  89. Label-guided generative adversarial network for realistic image synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  90. Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5802–5810, 2019.
  91. Denoising diffusion models for plug-and-play image restoration. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (NTIRE), 2023.
Citations (2)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.