NIMA: Neural Image Assessment (1709.05424v2)

Published 15 Sep 2017 in cs.CV

Abstract: Automatically learned quality assessment for images has recently become a hot topic due to its usefulness in a wide variety of applications such as evaluating image capture pipelines, storage techniques and sharing media. Despite the subjective nature of this problem, most existing methods only predict the mean opinion score provided by datasets such as AVA [1] and TID2013 [2]. Our approach differs from others in that we predict the distribution of human opinion scores using a convolutional neural network. Our architecture also has the advantage of being significantly simpler than other methods with comparable performance. Our proposed approach relies on the success (and retraining) of proven, state-of-the-art deep object recognition networks. Our resulting network can be used to not only score images reliably and with high correlation to human perception, but also to assist with adaptation and optimization of photo editing/enhancement algorithms in a photographic pipeline. All this is done without need for a "golden" reference image, consequently allowing for single-image, semantic- and perceptually-aware, no-reference quality assessment.

Citations (815)

View on Semantic Scholar

Summary

The paper introduces a CNN model that predicts the entire distribution of human scores instead of merely averaging ratings.
It leverages squared Earth Mover's Distance loss to accurately capture perceptual differences and evaluates performance on AVA, TID2013, and LIVE datasets.
The approach has practical implications in photo ranking and image enhancement, balancing computational efficiency with high-quality predictions.

Neural Image Assessment (NIMA): Overview and Implications

The paper "NIMA: Neural Image Assessment" by Hossein Talebi and Peyman Milanfar from Google Research presents a convolutional neural network (CNN) approach to predict the distribution of human opinion scores for image quality assessment (IQA). Differing from traditional mean opinion score (MOS) prediction methods, NIMA predicts the full distribution of scores using a CNN trained on image datasets, specifically focusing on no-reference quality assessment where no golden reference image is available. This approach aims to yield a score distribution that closely correlates with human ratings.

Methodology and Architecture

The fundamental innovation in the NIMA model lies in its use of a CNN to predict the distribution of opinion scores, as opposed to merely predicting the MOS. The paper utilizes CNNs pre-trained on ImageNet for initial weights, which are then fine-tuned on IQA datasets to learn perceptual quality. Specifically, several baseline CNN architectures, including VGG16, Inception-v2, and MobileNet, are evaluated in this context.

Loss Function: The model employs the squared Earth Mover's Distance (EMD) loss, which is particularly suited for ordered-class problems like IQA. The EMD loss penalizes deviations between predicted and actual cumulative distribution functions, thereby effectively capturing the nuances in quality perception.

Experimental Results

Datasets: The model is evaluated using three primary datasets: AVA (Aesthetic Visual Analysis), TID2013 (Tampere Image Database), and LIVE (in the wild image quality challenge). Each of these datasets encompasses a broad spectrum of quality and aesthetic scores, providing a robust testing ground for the proposed model.

Performance Metrics: The paper reports a variety of performance metrics, including linear correlation coefficient (LCC), Spearman's rank correlation coefficient (SRCC), and Earth Mover's Distance (EMD), for both mean scores and standard deviations. Notably, the NIMA models achieve high accuracy rates for binary classification tasks, with Inception-v2 and VGG16 architectures demonstrating high correlation coefficients (LCC up to 0.636 and SRCC up to 0.612) on the AVA dataset.

Cross-Dataset Evaluation: The robustness of the NIMA model is further highlighted in cross-dataset evaluations. Models trained on the AVA dataset show strong performance when tested on both TID2013 and LIVE datasets, indicating good generalization capabilities.

Applications and Practical Implications

Photo Ranking: The NIMA model's ability to predict detailed score distributions allows for sophisticated photo ranking tasks based on predicted perceptual quality. By comparing predicted scores with ground truth, the model effectively discerns subtle aesthetic differences in images.

Image Enhancement: One of the critical applications highlighted is image enhancement. The NIMA model can be integrated into photo editing pipelines to optimize parameters for tone enhancement and denoising operations. For instance, it can guide the selection of parameters for multi-layer Laplacian filters and Turbo denoising to enhance images perceptually.

Computational Considerations

The paper also addresses the computational complexity of the models. Among the evaluated architectures, MobileNet is the fastest, offering significantly lower computational cost at the expense of a slight reduction in performance. This tradeoff is particularly valuable for real-time applications on mobile devices.

Theoretical and Future Directions

The theoretical contribution of the NIMA model includes the validation of EMD loss in the context of perceptual IQA, reinforcing its utility for ordered-class tasks. For future work, the paper suggests leveraging the NIMA model as a loss function for training enhancement algorithms, which could potentially streamline the optimization process and reduce computational overhead.

Conclusions

In summary, the NIMA approach offers a significant advancement in the field of no-reference IQA by predicting detailed score distributions. The model's integration into enhancement pipelines underscores its practical benefits, while its robust performance across multiple datasets highlights its theoretical contributions. As AI continues to evolve, models like NIMA could pave the way for more nuanced and human-like perceptual assessments in image processing.

Overall, "NIMA: Neural Image Assessment" enhances our understanding of image quality evaluation, presenting a versatile and effective approach that holds promise for various practical applications in photography and computer vision.

PDF Markdown

Related Papers

YouTube

Show All Videos