As large as it gets: Learning infinitely large Filters via Neural Implicit Functions in the Fourier Domain (2307.10001v2)
Abstract: Recent work in neural networks for image classification has seen a strong tendency towards increasing the spatial context. Whether achieved through large convolution kernels or self-attention, models scale poorly with the increased spatial context, such that the improved model accuracy often comes at significant costs. In this paper, we propose a module for studying the effective filter size of convolutional neural networks. To facilitate such a study, several challenges need to be addressed: 1) we need an effective means to train models with large filters (potentially as large as the input data) without increasing the number of learnable parameters 2) the employed convolution operation should be a plug-and-play module that can replace conventional convolutions in a CNN and allow for an efficient implementation in current frameworks 3) the study of filter sizes has to be decoupled from other aspects such as the network width or the number of learnable parameters 4) the cost of the convolution operation itself has to remain manageable i.e. we cannot naively increase the size of the convolution kernel. To address these challenges, we propose to learn the frequency representations of filter weights as neural implicit functions, such that the better scalability of the convolution in the frequency domain can be leveraged. Additionally, due to the implementation of the proposed neural implicit function, even large and expressive spatial filters can be parameterized by only a few learnable weights. Our analysis shows that, although the proposed networks could learn very large convolution kernels, the learned filters are well localized and relatively small in practice when transformed from the frequency to the spatial domain. We anticipate that our analysis of individually optimized filter sizes will allow for more efficient, yet effective, models in the future. https://github.com/GeJulia/NIFF.
- Improving stability during upsampling–on the importance of spatial context. arXiv preprint arXiv:2311.17524, 2023.
- Spectral-based convolutional neural network without multiple spatial-frequency domain switchings. Neurocomputing, 364:152–167, 2019.
- The fourier transform and its applications. American Journal of Physics, 34(8):712–712, 1966.
- Learning continuous image representation with local implicit image function. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8628–8638, 2021.
- Fast fourier convolution. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 4479–4488. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/2fd5d41ec6cfab47e32164d5624269b1-Paper.pdf.
- Neural unsigned distance fields for implicit function learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 21638–21652. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/f69e505b08403ad2298b9f262659929a-Paper.pdf.
- Steerable cnns. arXiv preprint arXiv:1612.08498, 2016.
- An algorithm for the machine calculation of complex fourier series. Mathematics of computation, 19(90):297–301, 1965.
- Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp. 764–773, 2017.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11963–11975, June 2022.
- Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12124–12134, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- George H Dunteman. Principal components analysis, volume 69. Sage, 1989.
- Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- D. Forsyth and J. Ponce. Computer Vision: A Modern Approach. An Alan R. Apt book. Prentice Hall, 2003. ISBN 9780130851987. URL https://books.google.de/books?id=VAd5QgAACAAJ.
- Frequencylowcut pooling – plug & play against catastrophic overfitting. In European Conference on Computer Vision, 2022. URL https://arxiv.org/abs/2204.00491.
- Fix your downsampling asap! be natively more robust via aliasing and spectral artifact free pooling. arXiv preprint arXiv:2307.09804, 2023.
- Efficiently modeling long sequences with structured state spaces. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=uYLFoz1vlAC.
- Specnet: spectral domain convolutional neural network. arXiv preprint arXiv:1905.10915, 2019.
- Segnext: Rethinking convolutional attention design for semantic segmentation. arXiv preprint arXiv:2209.08575, 2022.
- Hypernetworks. arXiv preprint arXiv:1609.09106, 2016.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 1314–1324, 2019.
- Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017.
- Spectral distribution aware image generation. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pp. 1734–1742, 2021.
- Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s):1–41, 2022.
- More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity. arXiv preprint arXiv:2207.03620, 2022a.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022, 2021.
- A convnet for the 2020s. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022b.
- Improving native cnn robustness with filter frequency regularization. Transactions on Machine Learning Research, 2023.
- Hyper-convolution networks for biomedical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1933–1942, 2022.
- Hyper-convolutions via implicit kernels for medical image analysis. Medical Image Analysis, 86:102796, 2023. ISSN 1361-8415. doi: https://doi.org/10.1016/j.media.2023.102796. URL https://www.sciencedirect.com/science/article/pii/S1361841523000579.
- Fast training of convolutional networks through ffts. arXiv preprint arXiv:1312.5851, 2013.
- Learning convolutional neural networks in the frequency domain, 2022. URL https://arxiv.org/abs/2204.06718.
- Large kernel matters – improve semantic segmentation by global convolutional network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
- Resolution learning in deep convolutional networks using scale-space theory. IEEE Transactions on Image Processing, 30:8342–8353, 2021.
- Hyena hierarchy: Towards larger convolutional language models. In International Conference on Machine Learning, pp. 28043–28078. PMLR, 2023.
- Fcnn: Fourier convolutional neural networks. In Michelangelo Ceci, Jaakko Hollmén, Ljupčo Todorovski, Celine Vens, and Sašo Džeroski (eds.), Machine Learning and Knowledge Discovery in Databases, pp. 786–798, Cham, 2017. Springer International Publishing. ISBN 978-3-319-71249-9.
- Global filter networks for image classification. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=K_Mnsw5VoOW.
- Flexconv: Continuous kernel convolutions with differentiable kernel sizes. In International Conference on Learning Representations, 2022a. URL https://openreview.net/forum?id=3jooF27-0Wy.
- CKConv: Continuous kernel convolution for sequential data. In International Conference on Learning Representations, 2022b. URL https://openreview.net/forum?id=8FhxBtXSl0.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520, 2018.
- Implicit neural representations with periodic activation functions. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 7462–7473. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/53c04118df112c13a8c34b38343b9c10-Paper.pdf.
- Scale-equivariant steerable networks. arXiv preprint arXiv:1910.11093, 2019.
- Nergis Tomen and Jan C van Gemert. Spectral leakage and rethinking the kernel size in cnns. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5138–5147, 2021.
- Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pp. 10347–10357. PMLR, 2021.
- Deep complex networks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=H1T2hmZAb.
- Fast convolutional nets with fbfft: A gpu performance evaluation. arXiv preprint arXiv:1412.7580, 2014.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Combining fft and spectral-pooling for efficient convolution neural network model. In 2016 2nd International Conference on Artificial Intelligence and Industrial Engineering (AIIE 2016), pp. 203–206. Atlantis Press, 2016.
- Image classification in frequency domain with 2srelu: a second harmonics superposition activation function. Applied Soft Computing, 112:107851, 2021.
- Shmuel Winograd. On computing the discrete fourier transform. Mathematics of computation, 32(141):175–199, 1978.
- Deep scale-spaces: Equivariance over scale. Advances in Neural Information Processing Systems, 32, 2019.
- Julia Grabinski (8 papers)
- Janis Keuper (66 papers)
- Margret Keuper (77 papers)