CoSeR: Bridging Image and Language for Cognitive Super-Resolution (2311.16512v4)
Abstract: Existing super-resolution (SR) models primarily focus on restoring local texture details, often neglecting the global semantic information within the scene. This oversight can lead to the omission of crucial semantic details or the introduction of inaccurate textures during the recovery process. In our work, we introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with the capacity to comprehend low-resolution images. We achieve this by marrying image appearance and language understanding to generate a cognitive embedding, which not only activates prior information from large text-to-image diffusion models but also facilitates the generation of high-quality reference images to optimize the SR process. To further improve image fidelity, we propose a novel condition injection scheme called "All-in-Attention", consolidating all conditional information into a single module. Consequently, our method successfully restores semantically correct and photorealistic details, demonstrating state-of-the-art performance across multiple benchmarks. Code: https://github.com/VINHYU/CoSeR
- To learn image super-resolution, use a gan to learn how to do image degradation first. In ECCV, pages 185–200, 2018.
- Toward real-world single image super-resolution: A new benchmark and a new model. In ICCV, pages 3086–3095, 2019.
- Reference-based image super-resolution with deformable attention transformer. In ECCV, pages 325–342. Springer, 2022.
- Glean: Generative latent bank for large-factor image super-resolution. In CVPR, pages 14245–14254, 2021.
- Camera lens super-resolution. In CVPR, pages 1652–1660, 2019a.
- Camera lens super-resolution. In CVPR, pages 1652–1660, 2019b.
- Real-world blind super-resolution via feature matching with implicit high-resolution priors. In ACMMM, pages 1329–1338, 2022.
- Identity-aware face super-resolution for low-resolution face recognition. SPL, 27:645–649, 2020.
- Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255. Ieee, 2009.
- Diffusion models beat gans on image synthesis. NeurIPS, 34:8780–8794, 2021.
- Image quality assessment: Unifying structure and texture similarity. TPAMI, 44(5):2567–2581, 2020.
- Taming transformers for high-resolution image synthesis. In CVPR, pages 12873–12883, 2021.
- Generative diffusion prior for unified image restoration and enhancement. In CVPR, pages 9935–9946, 2023.
- Image processing using multi-code gan prior. In CVPR, pages 3012–3021, 2020.
- Eigenface-domain super-resolution for face recognition. TIP, 12(5):597–606, 2003.
- Task-driven super resolution: Object detection in low-resolution images. In ICONIP, pages 387–395. Springer, 2021.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. NeurIPS, 30, 2017.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. NeurIPS, 33:6840–6851, 2020.
- Composer: Creative and controllable image synthesis with composable conditions. arXiv preprint arXiv:2302.09778, 2023.
- Real-world super-resolution via kernel estimation and noise injection. In CVPRW, pages 466–467, 2020.
- Robust reference-based super-resolution via c2-matching. In CVPR, pages 2103–2112, 2021.
- Denoising diffusion restoration models. NeurIPS, 35:23593–23606, 2022.
- Musiq: Multi-scale image quality transformer. In ICCV, pages 5148–5157, 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Segment anything. ICCV, 2023.
- Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, pages 4681–4690, 2017.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023a.
- Best-buddy gans for highly detailed image super-resolution. In AAAI, pages 1412–1420, 2022.
- Azimuth super-resolution for fmcw radar in autonomous driving. In CVPR, pages 17504–17513, 2023b.
- Swinir: Image restoration using swin transformer. In ICCV, pages 1833–1844, 2021.
- Efficient and degradation-adaptive network for real-world image super-resolution. In ECCV, pages 574–591. Springer, 2022a.
- Details or artifacts: A locally discriminative learning approach to realistic image super-resolution. In CVPR, pages 5657–5666, 2022b.
- Diffbir: Towards blind image restoration with generative diffusion prior. arXiv preprint arXiv:2308.15070, 2023.
- Blind image super-resolution: A survey and beyond. TPAMI, 45(5):5461–5480, 2022.
- Masa-sr: Matching acceleration and spatial adaptation for reference-based image super-resolution. In CVPR, pages 6368–6377, 2021.
- Controlling vision-language models for universal image restoration. arXiv preprint arXiv:2310.01018, 2023.
- Unified multi-modal latent diffusion for joint subject and text conditional image generation. arXiv preprint arXiv:2303.09319, 2023.
- Shunta Maeda. Unpaired image super-resolution using pseudo-supervision. In CVPR, pages 291–300, 2020.
- Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In CVPR, pages 2437–2445, 2020.
- T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453, 2023.
- Edgeconnect: Generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212, 2019.
- Exploiting deep generative prior for versatile image restoration and manipulation. TPAMI, 44(11):7474–7489, 2021.
- Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
- Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763. PMLR, 2021.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
- High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS, 35:36479–36494, 2022a.
- Image super-resolution via iterative refinement. TPAMI, 45(4):4713–4726, 2022b.
- Region-adaptive deformable network for image quality assessment. In CVPRW, pages 324–333, 2021.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Exploiting diffusion prior for real-world image super-resolution. arXiv preprint arXiv:2305.07015, 2023.
- Dual super-resolution learning for semantic segmentation. In CVPR, pages 3774–3783, 2020.
- Real-time surgical environment enhancement for robot-assisted minimally invasive surgery based on super-resolution. In ICRA, pages 3434–3440. IEEE, 2021a.
- Recovering realistic texture in image super-resolution by deep spatial feature transform. In CVPR, pages 606–615, 2018a.
- Esrgan: Enhanced super-resolution generative adversarial networks. In ECCVW, pages 0–0, 2018b.
- Towards real-world blind face restoration with generative facial prior. In CVPR, pages 9168–9178, 2021b.
- Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In ICCV, pages 1905–1914, 2021c.
- Zero-shot image restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490, 2022.
- Component divide-and-conquer for real-world image super-resolution. In ECCV, pages 101–117. Springer, 2020.
- Unsupervised real-world image super resolution via domain-distance aware training. In CVPR, pages 13385–13394, 2021.
- Coarse-to-fine embedded patchmatch and multi-scale dynamic aggregation for reference-based super-resolution. In AAAI, pages 2768–2776, 2022.
- Desra: Detect and delete the artifacts of gan-based real-world super-resolution models. In ICML, pages 38204–38226. PMLR, 2023.
- Learning texture transformer network for image super-resolution. In CVPR, pages 5791–5800, 2020a.
- Learning texture transformer network for image super-resolution. In CVPR, pages 5791–5800, 2020b.
- Maniqa: Multi-dimension attention network for no-reference image quality assessment. In CVPR, pages 1191–1200, 2022.
- Gan prior embedded network for blind face restoration in the wild. In CVPR, pages 672–681, 2021.
- Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization. arXiv preprint arXiv:2308.14469, 2023a.
- Synthesizing realistic image restoration training pairs: A diffusion approach. arXiv preprint arXiv:2303.06994, 2023b.
- Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In CVPRW, pages 701–710, 2018.
- Difface: Blind face restoration with diffused error contraction. arXiv preprint arXiv:2212.06512, 2022.
- Designing a practical degradation model for deep blind image super-resolution. In ICCV, pages 4791–4800, 2021.
- Adding conditional control to text-to-image diffusion models. In ICCV, pages 3836–3847, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, pages 586–595, 2018a.
- Image super-resolution using very deep residual channel attention networks. In ECCV, pages 286–301, 2018b.
- Image super-resolution by neural texture transfer. In CVPR, pages 7982–7991, 2019.
- Uni-controlnet: All-in-one control to text-to-image diffusion models. arXiv preprint arXiv:2305.16322, 2023.
- Crossnet: An end-to-end reference-based super resolution network using cross-scale warping. In ECCV, pages 88–104, 2018.
- Cross-scale internal graph neural network for image super-resolution. NeurIPS, 33:3499–3509, 2020.
- Haoze Sun (21 papers)
- Wenbo Li (115 papers)
- Jianzhuang Liu (91 papers)
- Haoyu Chen (71 papers)
- Renjing Pei (26 papers)
- Xueyi Zou (16 papers)
- Youliang Yan (31 papers)
- Yujiu Yang (155 papers)