Image Quality Assessment: Unifying Structure and Texture Similarity (2004.07728v3)

Published 16 Apr 2020 in cs.CV

Abstract: Objective measures of image quality generally operate by comparing pixels of a "degraded" image to those of the original. Relative to human observers, these measures are overly sensitive to resampling of texture regions (e.g., replacing one patch of grass with another). Here, we develop the first full-reference image quality model with explicit tolerance to texture resampling. Using a convolutional neural network, we construct an injective and differentiable function that transforms images to multi-scale overcomplete representations. We demonstrate empirically that the spatial averages of the feature maps in this representation capture texture appearance, in that they provide a set of sufficient statistical constraints to synthesize a wide variety of texture patterns. We then describe an image quality method that combines correlations of these spatial averages ("texture similarity") with correlations of the feature maps ("structure similarity"). The parameters of the proposed measure are jointly optimized to match human ratings of image quality, while minimizing the reported distances between subimages cropped from the same texture images. Experiments show that the optimized method explains human perceptual scores, both on conventional image quality databases, as well as on texture databases. The measure also offers competitive performance on related tasks such as texture classification and retrieval. Finally, we show that our method is relatively insensitive to geometric transformations (e.g., translation and dilation), without use of any specialized training or data augmentation. Code is available at https://github.com/dingkeyan93/DISTS.

Authors (4)

Keyan Ding (19 papers)
Kede Ma (57 papers)
Shiqi Wang (163 papers)
Eero P. Simoncelli (33 papers)

Citations (644)

View on Semantic Scholar

Summary

The paper introduces the DISTS model that unifies structure and texture similarity to overcome traditional IQA limitations.
The model employs a multi-scale, overcomplete representation via a modified VGG network to extract holistic image features.
Empirical tests demonstrate its superior alignment with human perceptual judgments, even under challenging geometric transformations.

Overview of "Image Quality Assessment: Unifying Structure and Texture Similarity"

The paper "Image Quality Assessment: Unifying Structure and Texture Similarity" presents a novel full-reference Image Quality Assessment (IQA) model that effectively integrates structure and texture similarity to address limitations in existing methodologies. The proposed Deep Image Structure and Texture Similarity (DISTS) model offers a comprehensive approach that aligns more closely with human perceptual assessments of image quality, particularly in handling texture resampling challenges.

Key Contributions

Texture Insensitivity: Traditional IQA models demonstrate high sensitivity to minimal spatial deviations, particularly in textured regions. DISTS overcomes this issue by being insensitive to texture resampling, ensuring more consistent quality assessments that align with human perception.
Innovative Framework: The model utilizes a multi-scale, overcomplete representation created via a modified VGG network. This representation captures both structural details and texture appearance, balancing these aspects to provide holistic image quality assessments.
Empirical Validation: Through various experiments, DISTS has shown enhanced performance in matching human perceptual scores across standard IQA databases and challenging texture databases, outperforming existing metrics.

Methodology

Feature Representation: The convolutional neural network transforms input images to a multi-scale representation. The spatial averages of the feature maps are shown to encapsulate texture appearance effectively.
Similarity Measures: The method computes texture similarity using spatial correlations within the feature maps and structure similarity through cross-correlations, weighted optimally to reflect human perception.
Optimization: The parameters are optimized jointly against human ratings and variations within texture image patches, ensuring minimal perceptual distance and maximal alignment with human judgment.

Results and Performance

The DISTS model demonstrates improved correlation with human assessments on both traditional and texture-specific IQA databases. Notably, the model achieves competitive performance on tasks like texture classification and retrieval, reflecting its robustness across diverse visual distortion scenarios. DISTS maintains its efficacy even under geometric transformations like translation and rotation, an area where many classical models fall short.

Implications and Future Directions

Practical Applications: Practically, DISTS can be instrumental in developing image processing solutions that require perceptual fidelity, such as compression and restoration, providing a metric better aligned with human visual evaluations.
Theoretical Insights: The balance of structure and texture similarity in DISTS offers a noteworthy contribution towards understanding human perception in computational models, paving the way for future research in texture perception and synthesis.
Possible Extensions: Future developments could explore adaptive local assessments, further enhancing the model's versatility in fine-grained quality measurement.

Through the integration of structure and texture similarity, the DISTS model sets a benchmark in IQA, demonstrating substantial promise in aligning computational assessments with complex human visual perception.

PDF Markdown

Related Papers

GitHub

GitHub - dingkeyan93/DISTS: IQA: Deep Image Structure and Texture Similarity Metric (376 stars)

Tweets

https://twitter.com/BHamadicharef/status/1762785585473917184