Partial Large Kernel CNNs for Efficient Super-Resolution (2404.11848v1)
Abstract: Recently, in the super-resolution (SR) domain, transformers have outperformed CNNs with fewer FLOPs and fewer parameters since they can deal with long-range dependency and adaptively adjust weights based on instance. In this paper, we demonstrate that CNNs, although less focused on in the current SR domain, surpass Transformers in direct efficiency measures. By incorporating the advantages of Transformers into CNNs, we aim to achieve both computational efficiency and enhanced performance. However, using a large kernel in the SR domain, which mainly processes large images, incurs a large computational overhead. To overcome this, we propose novel approaches to employing the large kernel, which can reduce latency by 86\% compared to the naive large kernel, and leverage an Element-wise Attention module to imitate instance-dependent weights. As a result, we introduce Partial Large Kernel CNNs for Efficient Super-Resolution (PLKSR), which achieves state-of-the-art performance on four datasets at a scale of $\times$4, with reductions of 68.1\% in latency and 80.2\% in maximum GPU memory occupancy compared to SRFormer-light.
- Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295β307, 2015.
- Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 136β144, 2017.
- Image super-resolution using very deep residual channel attention networks. In Proceedings of the European conference on computer vision (ECCV), pages 286β301, 2018.
- Anchor-based plain net for mobile image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2494β2502, 2021.
- Hybrid pixel-unshuffled network for lightweight image super-resolution. In Proceedings of the AAAI conference on artificial intelligence, volumeΒ 37, pages 2375β2383, 2023.
- Shufflemixer: An efficient convnet for image super-resolution. Advances in Neural Information Processing Systems, 35:17314β17326, 2022.
- Spatially-adaptive feature modulation for efficient image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13190β13199, 2023.
- Enhancing real-time super resolution with partial convolution and efficient variance attention. In Proceedings of the 31st ACM International Conference on Multimedia, pages 5348β5357, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012β10022, 2021.
- Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5728β5739, June 2022.
- Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1833β1844, 2021.
- Efficient long-range attention network for image super-resolution. In European conference on computer vision, pages 649β667. Springer, 2022.
- Activating more pixels in image super-resolution transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22367β22377, 2023.
- Omni aggregation networks for lightweight image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22378β22387, 2023.
- Srformer: Permuted self-attention for single image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12780β12791, 2023.
- Dual aggregation transformer for image super-resolution. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12312β12321, 2023.
- Feature modulation transformer: Cross-refinement of global representation via high-frequency prior for image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12514β12524, 2023.
- Dlgsanet: lightweight dynamic local and global self-attention networks for image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12792β12801, 2023.
- Unfolding once is enough: A deployment-friendly transformer unit for super-resolution. In Proceedings of the 31st ACM International Conference on Multimedia, pages 7952β7960, 2023.
- Run, donβt walk: Chasing higher flops for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12021β12031, 2023.
- Shvit: Single-head vision transformer with memory efficient macro design. arXiv preprint arXiv:2401.16456, 2024.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Efficient image super-resolution using pixel attention. In Computer VisionβECCV 2020 Workshops: Glasgow, UK, August 23β28, 2020, Proceedings, Part III 16, pages 56β72. Springer, 2020.
- A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976β11986, 2022.
- On the connection between local attention and dynamic depth-wise convolution, 2022.
- Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11963β11975, 2022.
- More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity. arXiv preprint arXiv:2207.03620, 2022.
- Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV), pages 116β131, 2018.
- Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13733β13742, 2021.
- Tri Dao. Flashattention-2: Faster attention with better parallelism and work partitioning. arXiv preprint arXiv:2307.08691, 2023.
- Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874β1883, 2016.
- N-gram in swin transformers for efficient lightweight image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2071β2081, 2023.
- Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132β7141, 2018.
- Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5659β5667, 2017.
- Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3β19, 2018.
- Ntire 2017 challenge on single image super-resolution: Methods and results. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 114β125, 2017.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Low-complexity single-image super-resolution based on nonnegative neighbor embedding. 2012.
- On single image scale-up using sparse-representations. In Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7, pages 711β730. Springer, 2012.
- A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, volumeΒ 2, pages 416β423. IEEE, 2001.
- Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5197β5206, 2015.
- Sketch-based manga retrieval using manga109 dataset. Multimedia tools and applications, 76:21811β21838, 2017.
- How do vision transformers work? In International Conference on Learning Representations, 2021.
- Interpreting super-resolution networks with local attribution maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9199β9208, 2021.
- Improving image restoration by revisiting global information aggregation. In European Conference on Computer Vision, pages 53β71. Springer, 2022.
- Unireplknet: A universal perception large-kernel convnet for audio, video, point cloud, time-series and image recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Collections
Sign up for free to add this paper to one or more collections.