Scaling and Masking: A New Paradigm of Data Sampling for Image and Video Quality Assessment (2401.02614v1)
Abstract: Quality assessment of images and videos emphasizes both local details and global semantics, whereas general data sampling methods (e.g., resizing, cropping or grid-based fragment) fail to catch them simultaneously. To address the deficiency, current approaches have to adopt multi-branch models and take as input the multi-resolution data, which burdens the model complexity. In this work, instead of stacking up models, a more elegant data sampling method (named as SAMA, scaling and masking) is explored, which compacts both the local and global content in a regular input size. The basic idea is to scale the data into a pyramid first, and reduce the pyramid into a regular data dimension with a masking strategy. Benefiting from the spatial and temporal redundancy in images and videos, the processed data maintains the multi-scale characteristics with a regular input size, thus can be processed by a single-branch model. We verify the sampling method in image and video quality assessment. Experiments show that our sampling method can improve the performance of current single-branch models significantly, and achieves competitive performance to the multi-branch models without extra model complexity. The source code will be available at https://github.com/Sissuire/SAMA.
- Image and video compression standards: algorithms and architectures.
- Perceptual quality assessment of omnidirectional images. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 580–588.
- Perceptual quality assessment of smartphone photography. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3677–3686.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 16000–16009.
- The Konstanz natural video database (KoNViD-1k). In 2017 Ninth international conference on quality of multimedia experience (QoMEX), 1–6. IEEE.
- KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment. IEEE Transactions on Image Processing, 29: 4041–4056.
- Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7132–7141.
- Single image super-resolution quality assessment: a real-world dataset, subjective studies, and an objective metric. IEEE Transactions on Image Processing, 31: 2279–2294.
- Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 5148–5157.
- Fully deep blind image quality predictor. IEEE Journal of selected topics in signal processing, 11(1): 206–220.
- Deep CNN-based blind image quality predictor. IEEE transactions on neural networks and learning systems, 30(1): 11–24.
- Korhonen, J. 2019. Two-level approach for no-reference consumer video quality assessment. IEEE Transactions on Image Processing, 28(12): 5923–5938.
- Blindly assess quality of in-the-wild videos via quality-aware pre-training and motion perception. IEEE Transactions on Circuits and Systems for Video Technology, 32(9): 5944–5958.
- Quality assessment of in-the-wild videos. In Proceedings of the 27th ACM International Conference on Multimedia, 2351–2359.
- Which has better visual quality: The clear blue sky or a blurry animal? IEEE Transactions on Multimedia, 21(5): 1221–1234.
- Blind image quality index for authentic distortions with local and global deep feature aggregation. IEEE Transactions on Circuits and Systems for Video Technology, 32(12): 8512–8523.
- Quality Assessment of UGC Videos Based on Decomposition and Recomposition. IEEE Transactions on Circuits and Systems for Video Technology, 33(3): 1043–1054.
- Spatiotemporal representation learning for blind video quality assessment. IEEE Transactions on Circuits and Systems for Video Technology, 32(6): 3500–3513.
- Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 12009–12019.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, 10012–10022.
- Video swin transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 3202–3211.
- Image quality assessment using contrastive learning. IEEE Transactions on Image Processing, 31: 4149–4161.
- Maschke, T. 2013. Digitale kameratechnik: technik digitaler kameras in theorie und praxis. Springer-Verlag.
- VCRNet: Visual compensation restoration network for no-reference image quality assessment. IEEE Transactions on Image Processing, 31: 1613–1627.
- DACNN: Blind image quality assessment via a distortion-aware convolutional neural network. IEEE Transactions on Circuits and Systems for Video Technology, 32(11): 7518–7531.
- Data-Efficient Image Quality Assessment with Attention-Panel Decoder. arXiv preprint arXiv:2304.04952.
- Blind image quality assessment: A natural scene statistics approach in the DCT domain. IEEE transactions on Image Processing, 21(8): 3339–3352.
- Blind prediction of natural video quality. IEEE Transactions on image Processing, 23(3): 1352–1365.
- Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5846–5855.
- Large-scale study of perceptual video quality. IEEE Transactions on Image Processing, 28(2): 612–627.
- Blind image quality assessment for authentic distortions by intermediary enhancement and iterative training. IEEE Transactions on Circuits and Systems for Video Technology, 32(11): 7592–7604.
- Blindly assess image quality in the wild guided by a self-adaptive hyper network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3667–3676.
- UGC-VQA: Benchmarking blind video quality assessment for user generated content. IEEE Transactions on Image Processing, 30: 4449–4464.
- Exploring clip for assessing the look and feel of images. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 2555–2563.
- YouTube UGC dataset for video compression research. In 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), 1–5. IEEE.
- Rich features for perceptual quality assessment of UGC videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13435–13444.
- Fast-vqa: Efficient end-to-end video quality assessment with fragment sampling. In European Conference on Computer Vision, 538–554. Springer.
- Neighbourhood representative sampling for efficient end-to-end video quality assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–17.
- DisCoVQA: Temporal distortion-content transformers for video quality assessment. IEEE Transactions on Circuits and Systems for Video Technology, 1–1.
- Exploring opinion-unaware video quality assessment with semantic affinity criterion. In Processings of International Conference on Multimedia and Expo (ICME).
- Towards robust text-prompted semantic criterion for in-the-wild video quality assessment. arXiv:2304.14672.
- Exploring video quality assessment on user generated contents from aesthetic and technical perspectives. In IEEE International Conference on Computer Vision, 1–8.
- Towards explainable video quality assessment: A database and a language-prompted approach. In Proceedings of the 31st ACM International Conference on Multimedia (ACM MM).
- Quality assessment for video with degradation along salient trajectories. IEEE Transactions on Multimedia, 21(11): 2738–2749.
- End-to-end blind image quality prediction with cascaded deep neural network. IEEE Transactions on image processing, 29: 7414–7426.
- No-reference image quality assessment with visual pattern degradation. Information sciences, 504: 487–500.
- Patch-VQ:’Patching Up’the video quality problem. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 14019–14029.
- From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3575–3585.
- Blind image quality assessment via vision-language correspondence: A multitask learning perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14071–14081.
- MD-VQA: Multi-dimensional quality assessment for UGC live videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1746–1755.
- Quality-aware pre-trained models for blind image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 22302–22313.
- Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1302–1310.
- MetaIQA: Deep meta-learning for no-reference image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14143–14152.
- Blind Image Quality Assessment Via Cross-View Consistency. IEEE Transactions on Multimedia, 1–14.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.