CNNs for JPEGs: A Study in Computational Cost (2012.14426v3)
Abstract: Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade, defining state-of-the-art in several computer vision tasks. CNNs are capable of learning robust representations of the data directly from the RGB pixels. However, most image data are usually available in compressed format, from which the JPEG is the most widely used due to transmission and storage purposes demanding a preliminary decoding process that have a high computational load and memory usage. For this reason, deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years. Those methods usually extract a frequency domain representation of the image, like DCT, by a partial decoding, and then make adaptation to typical CNNs architectures to work with them. One limitation of these current works is that, in order to accommodate the frequency domain data, the modifications made to the original model increase significantly their amount of parameters and computational complexity. On one hand, the methods have faster preprocessing, since the cost of fully decoding the images is avoided, but on the other hand, the cost of passing the images though the model is increased, mitigating the possible upside of accelerating the method. In this paper, we propose a further study of the computational cost of deep models designed for the frequency domain, evaluating the cost of decoding and passing the images through the network. We also propose handcrafted and data-driven techniques for reducing the computational complexity and the number of parameters for these models in order to keep them similar to their RGB baselines, leading to efficient models with a better trade off between computational cost and accuracy.
- B. Deguerre, C. Chatelain, and G. Gasso, “Fast object detection in compressed JPEG images,” in IEEE Intell. Transp. Syst. Conf. (ITSC’19), 2019, pp. 333–338.
- Y. Li, S. Gu, L. V. Gool, and R. Timofte, “Learning filter basis for convolutional neural network compression,” in IEEE Int. Conf. on Comput. Vis. (ICCV’19), 2019, pp. 5623–5632.
- L. Liu, S. Zhang, Z. Kuang, A. Zhou, J.-H. Xue, X. Wang, Y. Chen, W. Yang, Q. Liao, and W. Zhang, “Group fisher pruning for practical network compression,” in Int. Conf. on Mach. Learn. (ICML’21), 2021, pp. 7021–7032.
- M. Ehrlich and L. S. Davis, “Deep residual learning in the JPEG transform domain,” in IEEE Int. Conf. on Comput. Vis. (ICCV’19), 2019, pp. 3484–3493.
- G. Ding, S. Zhang, Z. Jia, J. Zhong, and J. Han, “Where to prune: Using lstm to guide data-dependent soft pruning,” IEEE Trans. on Image Process. (IEEE TIP), vol. 30, pp. 293–304, 2020.
- A. Marchisio, M. A. Hanif, F. Khalid, G. Plastiras, C. Kyrkou, T. Theocharides, and M. Shafique, “Deep learning for edge computing: Current trends, cross-layer optimizations, and open research challenges,” in IEEE Comput. Soc. Annu. Symp. on VLSI (ISVLS’19), 2019, pp. 553–559.
- B. Deguerre, C. Chatelain, and G. Gasso, “Object detection in the DCT domain: is luminance the solution?” in IEEE Int. Conf. on Pattern Recog. (ICPR’20), 2021, pp. 2627–2634.
- M. Ehrlich, L. Davis, S.-N. Lim, and A. Shrivastava, “Analyzing and mitigating jpeg compression defects in deep learning,” in IEEE/CVF Int. Conf. on Comput. Vis. Workshops (ICCVW’21), 2021, pp. 2357–2367.
- X. Wang, Z. Zhou, Z. Yuan, J. Zhu, G. Sun, Y. Cao, Y. Zhang, and K. Sun, “Fd-cnn: A frequency-domain fpga acceleration scheme for cnn-based image processing applications,” ACM Trans. on Embedded Comput. Syst. (TECS), 2022.
- L. Gueguen, A. Sergeev, B. Kadlec, R. Liu, and J. Yosinski, “Faster neural networks straight from JPEG,” in Annual Conf. on Neural Information Process. Syst. (NIPS’18), 2018, pp. 3937–3948.
- S.-Y. Lo and H.-M. Hang, “Exploring semantic segmentation on the DCT representation,” in ACM Multimedia Asia (MMAsia’19), 2019, pp. 1–6.
- K. Xu, M. Qin, F. Sun, Y. Wang, Y.-K. Chen, and F. Ren, “Learning in the Frequency Domain,” in IEEE/CVF Conf. on Comput. Vis. and Pattern Recog. (CVPR’20), 2020, pp. 1740–1749.
- M. Ehrlich, L. Davis, S.-N. Lim, and A. Shrivastava, “Quantization Guided JPEG Artifact Correction,” in European Conf. on Comput. Vis. (ECCV’20). Springer, 2020.
- S. F. Santos, N. Sebe, and J. Almeida, “CV-C3D: action recognition on compressed videos with convolutional 3d networks,” in SIBGRAPI – Conf. on Graphics, Patterns and Images (SIBGRAPI’19), 2019, pp. 24–30.
- S. F. Santos and J. Almeida, “Faster and accurate compressed video action recognition straight from the frequency domain,” in SIBGRAPI – Conf. on Graphics, Patterns and Images (SIBGRAPI’20), 2020, pp. 62–68.
- B. Rajesh, M. Javed, S. Srivastava et al., “DCT-CompCNN: A novel image classification network using JPEG compressed DCT coefficients,” in IEEE Conf. on Information and Communication Technol. (CICT’19), 2019, pp. 1–6.
- Y. Tang, X. Zhang, X. Hu, S. Wang, and H. Wang, “Facial expression recognition using frequency neural network,” IEEE Trans. on Image Process. (IEEE TIP), vol. 30, pp. 444–457, 2020.
- Y. He, W. Chen, Z. Liang, D. Chen, Y. Tan, X. Luo, C. Li, and Y. Guo, “Fast and accurate lane detection via frequency domain learning,” in ACM Int. Conf. on Multimedia (ACM-MM’21), 2021, pp. 890–898.
- A. Deshpande, V. V. Estrela, and P. Patavardhan, “The dct-cnn-resnet50 architecture to classify brain tumors with super-resolution, convolutional neural network, and the resnet50,” Neuroscience Informatics, vol. 1, no. 4, p. 100013, 2021.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE/CVF Conf. on Comput. Vis. and Pattern Recog. (CVPR’16), 2016, pp. 770–778.
- S. F. Santos, N. Sebe, and J. Almeida, “The good, the bad, and the ugly: Neural networks straight from jpeg,” in IEEE Int. Conf. on Image Process. (ICIP’20), 2020, pp. 1896–1900.
- S. F. dos Santos and J. Almeida, “Less is more: Accelerating faster neural networks straight from jpeg,” in Iberoamerican Congress on Pattern Recog. (CIARP’21), 2021, pp. 237–247.
- S. Temburwar, B. Rajesh, and M. Javed, “Deep learning-based image retrieval in the jpeg compressed domain,” in Adv. Mach. Intell. and Signal Process., 2022, pp. 351–363.
- S.-Y. Lo, H.-M. Hang, S.-W. Chan, and J.-J. Lin, “Efficient dense modules of asymmetric convolution for real-time semantic segmentation,” in ACM Multimedia Asia (MMAsia’19), 2019, pp. 1–6.
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C.-Y. Fu, and A. C. Berg, “SSD: single shot multibox detector,” in European Conf. on Comput. Vis. (ECCV’16), 2016, pp. 21–37.
- T. Chen, L. Lin, W. Zuo, X. Luo, and L. Zhang, “Learning a wavelet-like auto-encoder to accelerate deep neural networks,” in AAAI Conf. on Artif. Intell. (AAAI’18), vol. 32, no. 1, 2018.
- L. D. Chamain and Z. Ding, “Faster and accurate classification for jpeg2000 compressed images in networked applications,” CoRR, vol. abs/1909.05638, 2019.
- J. Li, Y. Wang, H. Xie, and K.-K. Ma, “Learning a single model with a wide range of quality factors for jpeg image artifacts removal,” IEEE Trans. on Image Process. (IEEE TIP), vol. 29, pp. 8842–8854, 2020.
- P. Chen, W. Yang, M. Wang, L. Sun, K. Hu, and S. Wang, “Compressed domain deep video super-resolution,” IEEE Trans. on Image Process. (IEEE TIP), vol. 30, pp. 7156–7169, 2021.
- Z. Qin, P. Zhang, F. Wu, and X. Li, “Fcanet: Frequency channel attention networks,” in IEEE/CVF Int. Conf. on Comput. Vis. (ICCV’21), 2021, pp. 783–792.
- Y. Fang, Z. Chen, W. Lin, and C.-W. Lin, “Saliency detection in the compressed domain for adaptive image retargeting,” IEEE Trans. on Image Process. (IEEE TIP), vol. 21, no. 9, pp. 3888–3901, 2012.
- M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” in Conf. on Empirical Methods in NLP (EMNLP’15), 2015, pp. 1412–1421.
- M. Lin, Q. Chen, and S. Yan, “Network in network,” CoRR, vol. abs/1312.4400, 2013.
- F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in IEEE/CVF Conf. on Comput. Vis. and Pattern Recog. (CVPR’17), 2017, pp. 1251–1258.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and F.-F. Li, “Imagenet large scale visual recognition challenge,” Int. J. of Comput. Vis. (IJCV), vol. 115, no. 3, pp. 211–252, 2015.
- D. Tran, J. Ray, Z. Shou, S.-F. Chang, and M. Paluri, “Convnet architecture search for spatiotemporal feature learning,” CoRR, vol. abs/1708.05038, 2017.
- J. Carreira and A. Zisserman, “Quo vadis, action recognition? A new model and the kinetics dataset,” in CVPR, 2017, pp. 4724–4733.
- W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, M. Suleyman, and A. Zisserman, “The kinetics human action video dataset,” CoRR, vol. abs/1705.06950, 2017.