KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment (1910.06180v2)

Published 14 Oct 2019 in cs.CV and cs.MM

Abstract: Deep learning methods for image quality assessment (IQA) are limited due to the small size of existing datasets. Extensive datasets require substantial resources both for generating publishable content and annotating it accurately. We present a systematic and scalable approach to creating KonIQ-10k, the largest IQA dataset to date, consisting of 10,073 quality scored images. It is the first in-the-wild database aiming for ecological validity, concerning the authenticity of distortions, the diversity of content, and quality-related indicators. Through the use of crowdsourcing, we obtained 1.2 million reliable quality ratings from 1,459 crowd workers, paving the way for more general IQA models. We propose a novel, deep learning model (KonCept512), to show an excellent generalization beyond the test set (0.921 SROCC), to the current state-of-the-art database LIVE-in-the-Wild (0.825 SROCC). The model derives its core performance from the InceptionResNet architecture, being trained at a higher resolution than previous models (512x384). Correlation analysis shows that KonCept512 performs similar to having 9 subjective scores for each test image.

Authors (4)

Vlad Hosu (18 papers)
Hanhe Lin (18 papers)
Tamas Sziranyi (4 papers)
Dietmar Saupe (25 papers)

Citations (482)

View on Semantic Scholar

Summary

KonIQ-10k: An Ecologically Valid Database for Deep Learning of Blind Image Quality Assessment

The paper discusses the creation and implications of the KonIQ-10k dataset, the largest image quality assessment (IQA) dataset available, which aims to enhance the effectiveness of deep learning models in blind image quality assessment (BIQA). The authors provide a detailed account of their methodology in constructing this substantial dataset and propose a novel model, KonCept512, which demonstrates significant advancements in generalizing BIQA performance across various datasets.

Dataset Creation and Characteristics

KonIQ-10k comprises 10,073 images, each annotated with quality scores from 1.2 million ratings obtained through crowdsourcing. This scale surpasses previous databases, focusing on ecological validity by maintaining the authenticity of distortions and diversity of content. The authors' meticulous effort in ensuring a balanced sampling of quality-related indicators is notable, utilizing a diverse set of images from YFCC100M, which were re-scaled and filtered through a novel tag-based sampling procedure. Their use of the Viola-Jones face detector alongside saliency measures ensures the images retain meaningful content post-cropping.

Model Proposal and Evaluation

KonCept512, the deep learning model proposed, is built on the InceptionResNet architecture, trained on an unprecedented resolution of $512 \times 384$ pixels, which is higher than traditional approaches. This model achieves a SROCC of $0.921$ on the KonIQ-10k test set and $0.825$ on LIVE-in-the-Wild, underscoring its robust generalization capabilities. The experiments reveal that while MSE loss performed slightly better on the KonIQ-10k test, Huber loss was superior on cross-database evaluations, highlighting the importance of loss function choices in model performance.

Implications and Future Work

This research significantly contributes to BIQA by providing a larger and more representative dataset that can improve the training and evaluation of deep learning models. The KonIQ-10k's size and diversity offer a more authentic representation of real-world image distortions, challenging current models to generalize better. The performance of KonCept512, compared to state-of-the-art methods, indicates progress but also highlights the potential for further improvements in BIQA methods.

Future research could focus on expanding the training datasets, as projections indicate models trained on 100,000 images could bridge the gap between machine predictions and human perception. Exploring more sophisticated architectures or loss functions could yield enhanced models capable of even more accurate quality assessments across diverse image repositories, further establishing BIQA models as viable substitutes for subjective assessments across applications in imaging technologies.

Conclusion

The introduction of KonIQ-10k marks a significant step forward in blind image quality assessment. With a robust dataset and a proven deep learning model, the paper sets a new benchmark for future studies aiming to build more generalizable and scalable IQA solutions, offering insights into how datasets and models should evolve in the face of real-world data complexities.

Related Papers

Find Related Papers