Papers
Topics
Authors
Recent
2000 character limit reached

G-Refine: A General Quality Refiner for Text-to-Image Generation (2404.18343v1)

Published 29 Apr 2024 in cs.MM and cs.CV

Abstract: With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality results. To mitigate this limitation, we introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compromising the integrity of high-quality ones. The model is composed of three interconnected modules: a perception quality indicator, an alignment quality indicator, and a general quality enhancement module. Based on the mechanisms of the Human Visual System (HVS) and syntax trees, the first two indicators can respectively identify the perception and alignment deficiencies, and the last module can apply targeted quality enhancement accordingly. Extensive experimentation reveals that when compared to alternative optimization methods, AIGIs after G-Refine outperform in 10+ quality metrics across 4 databases. This improvement significantly contributes to the practical application of contemporary T2I models, paving the way for their broader adoption. The code will be released on https://github.com/Q-Future/Q-Refine.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18392–18402.
  2. TOPIQ: A Top-down Approach from Semantics to Distortions for Image Quality Assessment. arXiv:2308.03060 [cs.CV]
  3. PixArt-α𝛼\alphaitalic_α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis. 2310.00426.
  4. Exploring the Naturalness of AI-Generated Images. arXiv:2312.05476 [cs.CV]
  5. Swin2sr: Swinv2 transformer for compressed image super-resolution and restoration. In European Conference on Computer Vision. Springer, 669–687.
  6. InstructIR: High-Quality Image Restoration Following Human Instructions. arXiv:2401.16468 [cs.CV]
  7. DeepFloyd. 2023. IF-I-XL-v1.0. https://www.deepfloyd.ai.
  8. dreamlike art. 2023. dreamlike-photoreal-2.0. https://dreamlike.art.
  9. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning. International Conference on Learning Representations (2024).
  10. XGC-VQA: A Unified Video Quality Assessment Model for User, Professionally, and Occupationally-Generated Content. In IEEE International Conference on Multimedia and Expo Workshops.
  11. Convolutional neural networks for no-reference image quality assessment. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1733–1740.
  12. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF international conference on computer vision. 5148–5157.
  13. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4015–4026.
  14. Pick-a-pic: An open dataset of user preferences for text-to-image generation. Advances in Neural Information Processing Systems 36 (2024).
  15. Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment. arXiv:2403.11956 [cs.CV]
  16. AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment. arXiv:2404.03407 [cs.CV]
  17. A Real-Time Blind Quality-of-Experience Assessment Metric for HTTP Adaptive Streaming. In IEEE International Conference on Multimedia and Expo.
  18. MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model. arXiv:2402.16749 [cs.CV]
  19. Q-Refine: A Perceptual Quality Refiner for AI-Generated Image. arXiv:2401.01117 [cs.CV]
  20. A Full-Reference Quality Assessment Metric for Cartoon Images. In IEEE 24th International Workshop on Multimedia Signal Processing.
  21. AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment. IEEE Transactions on Circuits and Systems for Video Technology (2023).
  22. TextCraftor: Your Text Encoder Can be Image Quality Controller. arXiv:2403.18978 [cs.CV]
  23. CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks. arXiv:2304.05653 [cs.CV]
  24. DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior. arXiv:2308.15070 [cs.CV]
  25. Residual feature distillation network for lightweight image super-resolution. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer, 41–55.
  26. NTIRE 2024 Quality Assessment of AI-Generated Content Challenge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
  27. No-reference image quality assessment in the spatial domain. IEEE Transactions on image processing 21, 12 (2012), 4695–4708.
  28. Würstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models. In The Twelfth International Conference on Learning Representations.
  29. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. arXiv:2307.01952 [cs.CV]
  30. Discriminative Probing and Tuning for Text-to-Image Generation. arXiv:2403.04321 [cs.CV]
  31. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  32. Hierarchical Text-Conditional Image Generation with CLIP Latents. 2204.06125.
  33. Denseclip: Language-guided dense prediction with context-aware prompting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 18082–18091.
  34. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  35. Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models. 2207.13038.
  36. FreeU: Free Lunch in Diffusion U-Net. arXiv:2309.11497 [cs.CV]
  37. Blindly assess image quality in the wild guided by a self-adaptive hyper network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3667–3676.
  38. Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE transactions on image processing 27, 8 (2018), 3998–4011.
  39. Exploring clip for assessing the look and feel of images. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 2555–2563.
  40. Exploiting Diffusion Prior for Real-World Image Super-Resolution. arXiv:2305.07015 [cs.CV]
  41. DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 893–911.
  42. Unsupervised real-world image super resolution via domain-distance aware training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13385–13394.
  43. Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision. arXiv:2309.14181 [cs.CV]
  44. Q-instruct: Improving low-level visual abilities for multi-modality foundation models. arXiv:2311.06783 [cs.CV]
  45. Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels. arXiv:2312.17090 [cs.CV]
  46. Towards Open-ended Visual Quality Comparison. arXiv:2402.16641 [cs.CV]
  47. Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis. arXiv:2306.09341 [cs.CV]
  48. Human preference score: Better aligning text-to-image models with human preference. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2096–2105.
  49. Imagereward: Learning and evaluating human preferences for text-to-image generation. Advances in Neural Information Processing Systems 36 (2024).
  50. Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization. arXiv:2308.14469 [cs.CV]
  51. From Patches to Pictures (PaQ-2-PiQ): Mapping the Perceptual Space of Picture Quality. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition.
  52. Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Transactions on Circuits and Systems for Video Technology 30, 1 (2018), 36–47.
  53. Uncertainty-aware blind image quality assessment in the laboratory and wild. IEEE Transactions on Image Processing 30 (2021), 3474–3486.
  54. Blind image quality assessment via vision-language correspondence: A multitask learning perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14071–14081.
  55. A Perceptual Quality Assessment Exploration for AIGC Images. In IEEE International Conference on Multimedia and Expo Workshops (ICMEW). 440–445.
  56. Gms-3dqa: Projection-based grid mini-patch sampling for 3d model quality assessment. ACM Transactions on Multimedia Computing, Communications and Applications (2023).
  57. Advancing Zero-Shot Digital Human Quality Assessment through Text-Prompted Evaluation. arXiv:2307.02808 [eess.IV]
  58. Q-Boost: On Visual Quality Assessment Ability of Low-level Multi-Modality Foundation Models. arXiv:2312.15300 [cs.CV]
  59. Quality-of-Experience Evaluation for Digital Twins in 6G Network Environments. IEEE Transactions on Broadcasting (2024).
  60. GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image. In Advances in Neural Information Processing Systems, A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 77771–77782.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.