Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach (2401.05633v2)
Abstract: Recent progress in single-image super-resolution (SISR) has achieved remarkable performance, yet the computational costs of these methods remain a challenge for deployment on resource-constrained devices. In particular, transformer-based methods, which leverage self-attention mechanisms, have led to significant breakthroughs but also introduce substantial computational costs. To tackle this issue, we introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR), offering an effective and efficient solution for lightweight image super-resolution. The proposed method inherits the advantages of both convolution-based and transformer-based approaches. Specifically, CFSR utilizes large kernel convolutions as a feature mixer to replace the self-attention module, efficiently modeling long-range dependencies and extensive receptive fields with minimal computational overhead. Furthermore, we propose an edge-preserving feed-forward network (EFN) designed to achieve local feature aggregation while effectively preserving high-frequency information. Extensive experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance compared to existing lightweight SR methods. When benchmarked against state-of-the-art methods such as ShuffleMixer, the proposed CFSR achieves a gain of 0.39 dB on the Urban100 dataset for the x2 super-resolution task while requiring 26\% and 31\% fewer parameters and FLOPs, respectively. The code and pre-trained models are available at https://github.com/Aitical/CFSR.
- Deep learning for single image super-resolution: A brief review. IEEE Trans. Multim., 21(12):3106–3121, 2019.
- From beginner to master: A survey for deep learning-based single-image super-resolution. arXiv preprint arXiv:2109.14335, 2021.
- Deep learning for image super-resolution: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 43(10):3365–3387, 2021.
- Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell., 2016.
- Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017.
- Residual dense network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Deep back-projection networks for super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Image super-resolution using very deep residual channel attention networks. In Proceedings of the European conference on computer vision (ECCV), 2018.
- Second-order attention network for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Single image super-resolution via a holistic attention network. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
- Image super-resolution with non-local sparse attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Accelerating the super-resolution convolutional neural network. In Proceedings of the European Conference on Computer Vision (ECCV), 2016.
- Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European Conference on Computer Vision (ECCV), 2018.
- Lightweight image super-resolution with information multi-distillation network. In Proceedings of the ACM International Conference on Multimedia (ACM MM), 2019.
- LAPAR: linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
- Exploring sparsity in image super-resolution for efficient inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Efficient image super-resolution using pixel attention. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2020.
- Edge-oriented convolution block for real-time super resolution on mobile devices. In Proceedings of the ACM International Conference on Multimedia (ACM MM), 2021.
- Feature distillation interaction weighting network for lightweight image super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2022.
- Shufflemixer: An efficient convnet for image super-resolution. Advances in Neural Information Processing Systems (NeurIPS), 35:17314–17326, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2021.
- On efficient transformer and image pre-training for low-level vision. CoRR, abs/2112.10175, 2021.
- SwinIR: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021.
- Cross aggregation transformer for image restoration. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Efficient and explicit modelling of image hierarchies for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10809–10819, 2022.
- Conv2former: A simple transformer-style convnet for visual recognition. CoRR, abs/2211.11943, 2022.
- Visual attention network. CoRR, abs/2202.09741, 2022.
- Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- A convnet for the 2020s. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17662–17672, 2022.
- Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5718–5729, 2022.
- Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13733–13742, 2021.
- Acnet: Strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1911–1920, 2019.
- Self-supervised visual feature learning with deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 2021.
- A deep journey into super-resolution: A survey. ACM Comput. Surv., 2020.
- Image super-resolution via deep recursive residual network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Fast and accurate single image super-resolution via information distillation network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Activating more pixels in image super-resolution transformer. arXiv preprint arXiv:2205.04437, 2022.
- Hipa: Hierarchical patch transformer for single image super resolution. IEEE Transactions on Image Processing, 32:3226–3237, 2023.
- Incorporating transformer designs into convolutions for lightweight image super-resolution. CoRR, abs/2303.14324, 2023.
- Omni aggregation networks for lightweight image super-resolution. CoRR, abs/2304.10244, 2023.
- N-gram in swin transformers for efficient lightweight image super-resolution. CoRR, abs/2211.11436, 2022.
- More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity. CoRR, abs/2207.03620, 2022.
- Cvt: Introducing convolutions to vision transformers. In ICCV, pages 22–31. IEEE, 2021.
- Repsr: Training efficient vgg-style super-resolution networks with structural re-parameterization and batch normalization. In ACM Multimedia, pages 2556–2564. ACM, 2022.
- Ddistill-sr: Reparameterized dynamic distillation network for lightweight image super-resolution. IEEE Transactions on Multimedia, pages 1–13, 2022.
- Dynamic convolution: Attention over convolution kernels. In CVPR, pages 11027–11036. Computer Vision Foundation / IEEE, 2020.
- Soft-edge assisted network for single image super-resolution. IEEE Transactions on Image Processing, 29:4656–4668, 2020.
- Interpreting super-resolution networks with local attribution maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Deep Laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- s-lwsr: Super lightweight super-resolution network. IEEE Transactions on Image Processing, 29:8368–8380, 2020.
- A dynamic residual self-attention network for lightweight single image super-resolution. IEEE Transactions on Multimedia, 25:907–918, 2023.
- Residual feature distillation network for lightweight image super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2020.
- Efficient long-range attention network for image super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV), volume 13677 of Lecture Notes in Computer Science, pages 649–667. Springer, 2022.
- Latticenet: Towards lightweight image super-resolution with lattice block. In Proceedings of the European Conference on Computer Vision (ECCV), volume 12367 of Lecture Notes in Computer Science, pages 272–289. Springer, 2020.
- Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.
- Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In Proceedings of the British Machine Vision Conference (BMVC), 2012.
- On single image scale-up using sparse-representations. In Curves and Surfaces, 2010.
- A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2001.
- Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- Sketch-based manga retrieval using manga109 dataset. Multim. Tools Appl., 2017.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
- PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (NeurIPS), pages 8024–8035, 2019.
- Toward real-world single image super-resolution: A new benchmark and a new model. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3086–3095, 2019.
- Junjun Jiang (97 papers)
- Junpeng Jiang (6 papers)
- Xianming Liu (121 papers)
- Gang Wu (143 papers)