Blind Image Quality Assessment Using a Deep Bilinear Convolutional Neural Network
The paper addresses the challenge of blind image quality assessment (BIQA) by proposing a Deep Bilinear Convolutional Neural Network (DB-CNN), designed to handle both synthetic and authentic image distortions. The innovative approach involves the use of two convolutional neural networks (CNNs) that specialize in distinct types of distortions. Synthetic distortions are tackled by pre-training a CNN on large-scale datasets to classify distortion types and levels, whereas a pre-trained CNN is used for handling authentic distortions that are more complex and varied.
Methodology
The proposed DB-CNN model employs a two-stream architecture aimed at modeling synthetic and authentic distortions as two-factor variations:
- Synthetic Distortions Handling:
- The synthetic distortions stream (S-CNN) is trained using a dataset generated by merging the Waterloo Exploration Database and the PASCAL VOC Database, comprising over 850,000 distorted images.
- A CNN is pre-trained to classify distortion types and levels, facilitating robust initializations by leveraging labeled distortion data.
- Authentic Distortions Handling:
- The authentic distortions stream utilizes a pre-trained VGG-16 network.
- This approach leverages high-level features relevant to authentic distortions often encountered in real-world photographic images.
- Bilinear Pooling:
- DB-CNN efficiently combines the features from these streams using bilinear pooling.
- The bilinear representation offers a unified feature set which is then fine-tuned for quality prediction.
The entire DB-CNN model is optimized using a variant of stochastic gradient descent over subject-rated databases, demonstrating its applicability across different distortion scenarios.
Experimental Evaluation
DB-CNN's performance was evaluated on synthetic databases such as LIVE, CSIQ, and TID2013, as well as on the authentic LIVE Challenge Database. The results showcase:
- Robust Performance: The model reported superior performance with high SRCC and PLCC scores across various distortion types and databases.
- Cross-Database Generalizability: It exhibits considerable generalizability, outperforming current models even when tested in cross-database settings.
- Resilience in Distortion Handling: DB-CNN handles both single and multiply distorted scenarios effectively, suggesting a robust framework applicable in varied contexts.
- Waterloo Exploration Database Testing: Evaluated using D-Test, L-Test, and P-Test on Waterloo, DB-CNN demonstrated competitive discriminability and ranking consistency.
Implications and Future Directions
The application of DB-CNN establishes new possibilities for advancing BIQA through leveraging pre-training on synthetic datasets and pooling distinct feature sets. The fusion of both streams via bilinear pooling substantiates the model's practical utility, providing a comprehensive solution for diverse BIQA needs.
Future work could entail:
- Expanding distortion types and datasets for pre-training to enhance generalization.
- Exploring more advanced architectures like ResNet for potential performance gains.
- Refining the model to unify the training for both synthetic and authentic distortions, enhancing its comprehensive applicability.
DB-CNN sets a foundational framework for BIQA that bridges the processing of disparate distortion types, thus propelling advancements in image quality assessment research.