Hybrid of DiffStride and Spectral Pooling in Convolutional Neural Networks
Abstract: Stride determines the distance between adjacent filter positions as the filter moves across the input. A fixed stride causes important information contained in the image can not be captured, so that important information is not classified. Therefore, in previous research, the DiffStride Method was applied, namely the Strided Convolution Method with which it can learn its own stride value. Severe Quantization and a constraining lower bound on preserved information are arises with Max Pooling Downsampling Method. Spectral Pooling reduce the constraint lower bound on preserved information by cutting off the representation in the frequency domain. In this research a CNN Model is proposed with the Downsampling Learnable Stride Technique performed by Backpropagation combined with the Spectral Pooling Technique. Diffstride and Spectral Pooling techniques are expected to maintain most of the information contained in the image. In this study, we compare the Hybrid Method, which is a combined implementation of Spectral Pooling and DiffStride against the Baseline Method, which is the DiffStride implementation on ResNet 18. The accuracy result of the DiffStride combination with Spectral Pooling improves over DiffStride which is baseline method by 0.0094. This shows that the Hybrid Method can maintain most of the information by cutting of the representation in the frequency domain and determine the stride of the learning result through Backpropagation.
- X. Xiang, Y. Tian, V. Rengarajan, L. D. Young, B. Zhu, and R. Ranjan, “Learning Spatio-Temporal Downsampling for Effective Video Upscaling,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13678 LNCS, pp. 162–181, 2022.
- R. Riad, O. Teboul, D. Grangier, and N. Zeghidour, “Learning strides in convolutional neural networks,” pp. 1–16, 2022. [Online]. Available: http://arxiv.org/abs/2202.01653
- L. Boué, “Deep learning for pedestrians: backpropagation in CNNs,” pp. 1–44, 2018. [Online]. Available: http://arxiv.org/abs/1811.11987
- O. Rippel, J. Snoek, and R. P. Adams, “Spectral representations for convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 2015-Janua, no. 2013, pp. 2449–2457, 2015.
- H. Z. &. J. Ma, “Hartley Spectral Pooling for Deep Learning,” CSIAM Transactions on Applied Mathematics, vol. 1, no. 3, pp. 464–475, 2020.
- M. Mathieu, M. Henaff, and Y. LeCun, “Fast training of convolutional networks through FFTS,” 2nd International Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings, pp. 1–9, 2014.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–14, 2015.
- K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9908 LNCS, pp. 630–645, 2016.
- ——, “Deep residual learning for image recognition,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-Decem, pp. 770–778, 2016.
- R. Zhang, “Making convolutional networks shift-invariant again,” 36th International Conference on Machine Learning, ICML 2019, vol. 2019-June, pp. 12 712–12 722, 2019.
- Y. Hu, A. Huber, J. Anumula, and S.-C. Liu, “Overcoming the vanishing gradient problem in plain recurrent networks,” no. Section 2, pp. 1–20, 2018. [Online]. Available: http://arxiv.org/abs/1801.06105
- C. Y. Lee, P. W. Gallagher, and Z. Tu, “Generalizing pooling functions in convolutional neural networks: Mixed, gated, and tree,” Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016, pp. 464–472, 2016.
- S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” 32nd International Conference on Machine Learning, ICML 2015, vol. 1, pp. 448–456, 2015.
- T. Ho-Phuoc, “CIFAR10 to Compare Visual Recognition Performance between Deep Neural Networks and Humans,” 2018. [Online]. Available: http://arxiv.org/abs/1811.07270
- Y. Wu, L. Liu, J. Bae, K. H. Chow, A. Iyengar, C. Pu, W. Wei, L. Yu, and Q. Zhang, “Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks,” Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, pp. 1971–1980, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.