Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration (2403.05906v1)
Abstract: Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing UDC image restoration methods predominantly utilize convolutional neural network architectures, whereas Transformer-based methods have exhibited superior performance in the majority of image restoration tasks. This is attributed to the Transformer's capability to sample global features for the local reconstruction of images, thereby achieving high-quality image restoration. In this paper, we observe that when using the Vision Transformer for UDC degraded image restoration, the global attention samples a large amount of redundant information and noise. Furthermore, compared to the ordinary Transformer employing dense attention, the Transformer utilizing sparse attention can alleviate the adverse impact of redundant information and noise. Building upon this discovery, we propose a Segmentation Guided Sparse Transformer method (SGSFormer) for the task of restoring high-quality images from UDC degraded images. Specifically, we utilize sparse self-attention to filter out redundant information and noise, directing the model's attention to focus on the features more relevant to the degraded regions in need of reconstruction. Moreover, we integrate the instance segmentation map as prior information to guide the sparse self-attention in filtering and focusing on the correct regions.
- Z. Qin, Y.-H. Tsai, Y.-W. Yeh, Y.-P. Huang, and H.-P. D. Shieh, “See-through image blurring of transparent organic light-emitting diodes display: calculation method based on diffraction and analysis of pixel structures,” Journal of Display Technology, pp. 1242–1249, 2016.
- Y. Zhou, D. Ren, N. Emerton, S. Lim, and T. Large, “Image restoration for under-display camera,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 9179–9188.
- R. Feng, C. Li, H. Chen, S. Li, C. C. Loy, and J. Gu, “Removing diffraction image artifacts in under-display camera via dynamic skip connection network,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 662–671.
- A. Yang and A. C. Sankaranarayanan, “Designing display pixel layouts for under-panel cameras,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- Z. Zhang, “14.4: Diffraction simulation of camera under display,” in SID Symposium Digest of Technical Papers, 2021, pp. 93–96.
- C. X. Xu, Q. Yao, W. He, S. Shu, and G. C. Yuan, “P-125: A novel method to increase the transmittance of full display with camera,” in SID Symposium Digest of Technical Papers, 2023, pp. 1316–1318.
- Y. Li, J. Wu, and Z. Shi, “Lightweight neural network for enhancing imaging performance of under-display camera,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- K. Kwon, E. Kang, S. Lee, S.-J. Lee, H.-E. Lee, B. Yoo, and J.-J. Han, “Controllable image restoration for under-display camera in smartphones,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 2073–2082.
- V. Sundar, S. Hegde, D. Kothandaraman, and K. Mitra, “Deep atrous guided filter for image restoration in under display cameras,” in European Conference on Computer Vision Workshop, 2020, pp. 379–397.
- R. Feng, C. Li, S. Zhou, W. Sun, Q. Zhu, J. Jiang, Q. Yang, C. C. Loy, J. Gu, Y. Zhu et al., “Mipi 2022 challenge on under-display camera image restoration: Methods and results,” in European Conference on Computer Vision Workshop, 2023, pp. 60–77.
- Q. Yang, Y. Liu, J. Tang, and T. Ku, “Residual and dense unet for under-display camera restoration,” in European Conference on Computer Vision Workshop, 2021, pp. 398–408.
- H. Panikkasseril Sethumadhavan, D. Puthussery, M. Kuriakose, and J. Charangatt Victor, “Transform domain pyramidal dilated convolution networks for restoration of under display camera images,” in European Conference on Computer Vision Workshop, 2020, pp. 364–378.
- Y. Zhou, M. Kwan, K. Tolentino, N. Emerton, S. Lim, T. Large, L. Fu, Z. Pan, B. Li, Q. Yang et al., “Udc 2020 challenge on image restoration of under-display camera: Methods and results,” in European Conference on Computer Vision Workshop, 2020, pp. 337–351.
- X. Liu, J. Hu, X. Chen, and C. Dong, “Udc-unet: Under-display camera image restoration via u-shape dynamic network,” in European Conference on Computer Vision Workshop, 2023, pp. 113–129.
- M. Qi, Y. Li, and W. Heidrich, “Isp-agnostic image reconstruction for under-display cameras,” arXiv preprint arXiv:2111.01511, 2021.
- R. Feng, C. Li, H. Chen, S. Li, J. Gu, and C. C. Loy, “Generating aligned pseudo-supervision from non-aligned data for image restoration in under-display camera,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 5013–5022.
- M. V. Conde, F. Vasluianu, J. Vazquez-Corral, and R. Timofte, “Perceptual image enhancement for smartphone real-time applications,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 1848–1858.
- J. Tan, X. Chen, T. Wang, K. Zhang, W. Luo, and X. Cao, “Blind face restoration for under-display camera via dictionary guided transformer,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- J.-h. Lou, L. Zhang, and L. Ge, “7-1: Invited paper: Udc technology for oled display,” in SID Symposium Digest of Technical Papers, 2022, pp. 44–47.
- S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M.-H. Yang, “Restormer: Efficient transformer for high-resolution image restoration,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 5728–5739.
- Z. Wang, X. Cun, J. Bao, W. Zhou, J. Liu, and H. Li, “Uformer: A general u-shaped transformer for image restoration,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 683–17 693.
- H. Chen, Y. Wang, T. Guo, C. Xu, Y. Deng, Z. Liu, S. Ma, C. Xu, C. Xu, and W. Gao, “Pre-trained image processing transformer,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 12 299–12 310.
- Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, and Y. Li, “Maxim: Multi-axis mlp for image processing,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 5769–5780.
- Q. Guo, C. Zhang, Y. Zhang, and H. Liu, “An efficient svd-based method for image denoising,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 5, pp. 868–880, 2015.
- T. Wang, X. Zhang, R. Jiang, L. Zhao, H. Chen, and W. Luo, “Video deblurring via spatiotemporal pyramid network and adversarial gradient prior,” Computer Vision and Image Understanding, vol. 203, p. 103135, 2021.
- F. Wen, R. Ying, Y. Liu, P. Liu, and T.-K. Truong, “A simple local minimal intensity prior and an improved algorithm for blind image deblurring,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 8, pp. 2923–2937, 2020.
- T. Wang, K. Zhang, T. Shen, W. Luo, B. Stenger, and T. Lu, “Ultra-high-definition low-light image enhancement: a benchmark and transformer-based method,” in AAAI Conference on Artificial Intelligence, 2023, pp. 2654–2662.
- J. Li, X. Feng, and Z. Hua, “Low-light image enhancement via progressive-recursive network,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 11, pp. 4227–4240, 2021.
- K. Jiang, Z. Wang, P. Yi, C. Chen, B. Huang, Y. Luo, J. Ma, and J. Jiang, “Multi-scale progressive fusion network for single image deraining,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 8346–8355.
- K. Zhang, W. Luo, Y. Yu, W. Ren, F. Zhao, C. Li, L. Ma, W. Liu, and H. Li, “Beyond monocular deraining: Parallel stereo deraining network via semantic prior,” International Journal of Computer Vision, vol. 130, no. 7, pp. 1754–1769, 2022.
- H. Zhang, V. Sindagi, and V. M. Patel, “Image de-raining using a conditional generative adversarial network,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 11, pp. 3943–3956, 2019.
- K. Jiang, Z. Wang, P. Yi, C. Chen, Z. Han, T. Lu, B. Huang, and J. Jiang, “Decomposition makes better rain removal: An improved attention-guided deraining network,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3981–3995, 2020.
- Y. Qiu, K. Zhang, C. Wang, W. Luo, H. Li, and Z. Jin, “Mb-taylorformer: Multi-branch efficient transformer expanded by taylor formula for image dehazing,” in IEEE International Conference on Computer Vision, 2023, pp. 12 802–12 813.
- X. Zhang, T. Wang, W. Luo, and P. Huang, “Multi-level fusion and attention-guided cnn for image dehazing,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 11, pp. 4162–4173, 2020.
- S. Jaszczur, A. Chowdhery, A. Mohiuddin, L. Kaiser, W. Gajewski, H. Michalewski, and J. Kanerva, “Sparse is enough in scaling transformers,” Advances in Neural Information Processing Systems, pp. 9895–9907, 2021.
- A. Yang, E. Kang, H.-E. Lee, and A. C. Sankaranarayanan, “Designing phase masks for under-display cameras,” in IEEE International Conference on Computer Vision, 2023, pp. 10 637–10 645.
- G. Huseynova, J.-H. Lee, Y. H. Kim, and J. Lee, “Transparent organic light-emitting diodes: advances, prospects, and challenges,” Advanced Optical Materials, p. 2002040, 2021.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, 2015, pp. 234–241.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
- J. Koh, J. Lee, and S. Yoon, “Bnudc: A two-branched deep neural network for restoring images from under-display cameras,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 1950–1959.
- B. Song, X. Chen, S. Xu, and J. Zhou, “Under-display camera image restoration with scattering effect,” in IEEE International Conference on Computer Vision, 2023, pp. 12 580–12 589.
- C. Liu, X. Wang, S. Li, Y. Wang, and X. Qian, “Fsi: Frequency and spatial interactive learning for image restoration in under-display cameras,” in IEEE International Conference on Computer Vision, 2023, pp. 12 537–12 546.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, 2017.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in International Conference on Machine Learning, 2021, pp. 10 347–10 357.
- L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, Z.-H. Jiang, F. E. Tay, J. Feng, and S. Yan, “Tokens-to-token vit: Training vision transformers from scratch on imagenet,” in IEEE International Conference on Computer Vision, 2021, pp. 558–567.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in IEEE International Conference on Computer Vision, 2021, pp. 10 012–10 022.
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European Conference on Computer Vision, 2020, pp. 213–229.
- X. Chu, Z. Tian, Y. Wang, B. Zhang, H. Ren, X. Wei, H. Xia, and C. Shen, “Twins: Revisiting the design of spatial attention in vision transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 9355–9366, 2021.
- S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P. H. Torr et al., “Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 6881–6890.
- W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in IEEE International Conference on Computer Vision, 2021, pp. 568–578.
- E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems, pp. 12 077–12 090, 2021.
- S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM computing surveys (CSUR), pp. 1–41, 2022.
- K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu et al., “A survey on vision transformer,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 87–110, 2022.
- J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “Swinir: Image restoration using swin transformer,” in IEEE International Conference on Computer Vision, 2021, pp. 1833–1844.
- G. M. Correia, V. Niculae, and A. F. Martins, “Adaptively sparse transformers,” arXiv preprint arXiv:1909.00015, 2019.
- H. Ren, H. Dai, Z. Dai, M. Yang, J. Leskovec, D. Schuurmans, and B. Dai, “Combiner: Full attention transformer with sparse computation cost,” Advances in Neural Information Processing Systems, pp. 22 470–22 482, 2021.
- Y. Tay, D. Bahri, L. Yang, D. Metzler, and D.-C. Juan, “Sparse sinkhorn attention,” in International Conference on Machine Learning. PMLR, 2020, pp. 9438–9447.
- R. Child, S. Gray, A. Radford, and I. Sutskever, “Generating long sequences with sparse transformers,” arXiv preprint arXiv:1904.10509, 2019.
- S. Liu, J. Ye, S. Ren, and X. Wang, “Dynast: Dynamic sparse transformer for exemplar-guided image generation,” in European Conference on Computer Vision, 2022, pp. 72–90.
- L. Fan, Z. Pang, T. Zhang, Y.-X. Wang, H. Zhao, F. Wang, N. Wang, and Z. Zhang, “Embracing single stride 3d object detector with sparse transformer,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 8458–8468.
- J. Zhang, Y. Zhang, J. Gu, Y. Zhang, L. Kong, and X. Yuan, “Accurate image restoration with attention retractable transformer,” arXiv preprint arXiv:2210.01427, 2022.
- X. Chen, H. Li, M. Li, and J. Pan, “Learning a sparse transformer network for effective image deraining,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 5896–5905.
- S. Chen, T. Ye, J. Bai, E. Chen, J. Shi, and L. Zhu, “Sparse sampling transformer with uncertainty-driven ranking for unified removal of raindrops and rain streaks,” in IEEE International Conference on Computer Vision, 2023, pp. 13 106–13 117.
- J. Pan, Z. Hu, Z. Su, H.-Y. Lee, and M.-H. Yang, “Soft-segmentation guided object motion deblurring,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 459–468.
- X. Wang, K. Yu, C. Dong, and C. C. Loy, “Recovering realistic texture in image super-resolution by deep spatial feature transform,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 606–615.
- A. Aakerberg, A. S. Johansen, K. Nasrollahi, and T. B. Moeslund, “Semantic segmentation guided real-world super-resolution,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 449–458.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
- T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in IEEE International Conference on Computer Vision, 2017, pp. 2980–2988.
- F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in 2016 fourth international conference on 3D vision (3DV), 2016, pp. 565–571.
- G. Zhao, J. Lin, Z. Zhang, X. Ren, Q. Su, and X. Sun, “Explicit sparse transformer: Concentrated attention through explicit selection,” arXiv preprint arXiv:1912.11637, 2019.
- H. Wang, J. Shen, Y. Liu, Y. Gao, and E. Gavves, “Nformer: Robust person re-identification with neighbor transformer,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 7297–7307.
- P. Wang, X. Wang, F. Wang, M. Lin, S. Chang, H. Li, and R. Jin, “Kvt: k-nn attention for boosting vision transformers,” in European Conference on Computer Vision. Springer, 2022, pp. 285–302.
- J. Lin, Y. Cai, X. Hu, H. Wang, Y. Yan, X. Zou, H. Ding, Y. Zhang, R. Timofte, and L. Van Gool, “Flow-guided sparse transformer for video deblurring,” arXiv preprint arXiv:2201.01893, 2022.
- J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision, 2016, pp. 694–711.
- L. Chen, X. Lu, J. Zhang, X. Chu, and C. Chen, “Hinet: Half instance normalization network for image restoration,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 182–192.
- A. Hore and D. Ziou, “Image quality metrics: Psnr vs. ssim,” in IEEE International Conference on Pattern Recognition. IEEE, 2010, pp. 2366–2369.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- L. N. Smith, “Cyclical learning rates for training neural networks,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE, 2017, pp. 464–472.
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
- R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 586–595.
- K. Ding, K. Ma, S. Wang, and E. P. Simoncelli, “Image quality assessment: Unifying structure and texture similarity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2567–2581, 2020.