Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment (2204.08958v2)

Published 19 Apr 2022 in cs.CV and eess.IV

Abstract: No-Reference Image Quality Assessment (NR-IQA) aims to assess the perceptual quality of images in accordance with human subjective perception. Unfortunately, existing NR-IQA methods are far from meeting the needs of predicting accurate quality scores on GAN-based distortion images. To this end, we propose Multi-dimension Attention Network for no-reference Image Quality Assessment (MANIQA) to improve the performance on GAN-based distortion. We firstly extract features via ViT, then to strengthen global and local interactions, we propose the Transposed Attention Block (TAB) and the Scale Swin Transformer Block (SSTB). These two modules apply attention mechanisms across the channel and spatial dimension, respectively. In this multi-dimensional manner, the modules cooperatively increase the interaction among different regions of images globally and locally. Finally, a dual branch structure for patch-weighted quality prediction is applied to predict the final score depending on the weight of each patch's score. Experimental results demonstrate that MANIQA outperforms state-of-the-art methods on four standard datasets (LIVE, TID2013, CSIQ, and KADID-10K) by a large margin. Besides, our method ranked first place in the final testing phase of the NTIRE 2022 Perceptual Image Quality Assessment Challenge Track 2: No-Reference. Codes and models are available at https://github.com/IIGROUP/MANIQA.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Sidi Yang (6 papers)
  2. Tianhe Wu (6 papers)
  3. Shuwei Shi (12 papers)
  4. Shanshan Lao (5 papers)
  5. Yuan Gong (45 papers)
  6. Mingdeng Cao (22 papers)
  7. Jiahao Wang (88 papers)
  8. Yujiu Yang (155 papers)
Citations (206)

Summary

Overview of "MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment"

The paper introduces a novel approach to No-Reference Image Quality Assessment (NR-IQA) by utilizing a multi-dimension attention network, termed MANIQA. This research addresses the limitations of existing NR-IQA methods, particularly their inefficacy in predicting quality scores for images with GAN-based distortions. The introduction of MANIQA aims to authenticate the perceptual quality of images with greater accuracy, aligning with human subjective perception.

The core methodology of MANIQA involves a strategic integration of attention mechanisms across channel and spatial dimensions to bolster feature interactions within image regions, addressing both global and local scales. The model architecture leverages Vision Transformer (ViT) as a feature extractor. Tokenizing features from ViT is followed by employing the Transposed Attention Block (TAB) for enhanced channel interaction and the Scale Swin Transformer Block (SSTB) to fortify local interactions. The dual branch patch-weighted quality prediction structure allows MANIQA to effectively aggregate patchwise quality scores based on their respective weightings, delivering refined image quality predictions.

Key Contributions and Results

  1. Transposed Attention Block (TAB): This module adapts attention mechanisms to operate across channel dimensions, effectively capturing inter-channel dependencies and fostering global feature aggregation. This contrasts with the traditional spatial attention, enhancing the feature representation capacity when assessing image quality.
  2. Scale Swin Transformer Block (SSTB): SSTB supports the exploitation of local image features by interacting within spatial patches. This facilitates nuanced local context comprehension within an image, crucial for understanding fine-grained distortions introduced by GANs.
  3. Dual Branch Structure: The integration of separate scoring and weighting mechanisms ensures that the model considers both the saliency and quality of regions within an image. This dual consideration helps mitigate overfitting by balancing prominent but low-quality image sections against less noticeable high-quality ones.
  4. Empirical Performance: MANIQA demonstrated superior performance compared to state-of-the-art NR-IQA methods across multiple established benchmarks, including LIVE, TID2013, CSIQ, and KADID-10K datasets. Experimental results highlighted MANIQA's distinction in handling GAN-based distortions, a challenging aspect for traditional methods. Numerically, MANIQA achieved substantial gains in both PLCC and SROCC metrics, outperforming existing models by considerable margins.
  5. NTIRE 2022 Challenge: The model secured top ranking on the NTIRE 2022 Perceptual Image Quality Assessment Challenge Track 2 for No-Reference images, validating its applicability to real-world distorted scenarios and underscoring its robust design.

Theoretical and Practical Implications

The development of MANIQA is significant on both theoretical and practical fronts. Theoretically, it advances the understanding of how attention mechanisms can be diversified across dimensions to improve perceptual tasks such as image quality assessment. It underscores the potential of transformers to replace or augment traditional CNN architectures in capturing complex visual representations.

Practically, MANIQA offers a viable solution for industries reliant on automated visual content evaluation, notably in domains such as social media, surveillance, and autonomous systems, where high volumes of GAN-altered images are prevalent. The methodology's ability to predict perceptual quality more aligned with human judgments could facilitate enhanced user experiences by intelligently filtering or enhancing low-quality visual data.

Future Directions

The findings open avenues for further exploration into multi-dimensional attention strategies within image processing tasks. Future research could explore extending MANIQA's architecture to other computer vision domains, optimizing computational efficiency, and addressing new forms of synthetic distortions as GANs evolve. Additionally, there's potential in developing intepretability strategies for this and similar models, providing insights into decision-making processes for quality assessments.

In summary, the MANIQA framework presents a substantial contribution to the field of NR-IQA, offering notable improvements in dealing with GAN-induced image distortions and aligning machine assessments with human visual experience standards.