Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-resolution (2401.00740v2)
Abstract: The effective extraction of spatial-angular features plays a crucial role in light field image super-resolution (LFSR) tasks, and the introduction of convolution and Transformers leads to significant improvement in this area. Nevertheless, due to the large 4D data volume of light field images, many existing methods opted to decompose the data into a number of lower-dimensional subspaces and perform Transformers in each sub-space individually. As a side effect, these methods inadvertently restrict the self-attention mechanisms to a One-to-One scheme accessing only a limited subset of LF data, explicitly preventing comprehensive optimization on all spatial and angular cues. In this paper, we identify this limitation as subspace isolation and introduce a novel Many-to-Many Transformer (M2MT) to address it. M2MT aggregates angular information in the spatial subspace before performing the self-attention mechanism. It enables complete access to all information across all sub-aperture images (SAIs) in a light field image. Consequently, M2MT is enabled to comprehensively capture long-range correlation dependencies. With M2MT as the pivotal component, we develop a simple yet effective M2MT network for LFSR. Our experimental results demonstrate that M2MT achieves state-of-the-art performance across various public datasets. We further conduct in-depth analysis using local attribution maps (LAM) to obtain visual interpretability, and the results validate that M2MT is empowered with a truly non-local context in both spatial and angular subspaces to mitigate subspace isolation and acquire effective spatial-angular representation.
- T.-C. Wang, J.-Y. Zhu, E. Hiroaki, M. Chandraker, A. A. Efros, and R. Ramamoorthi, “A 4D light-field dataset and CNN architectures for material recognition,” in European Conference on Computer Vision. Springer, 2016, pp. 121–138.
- Z. Lu, H. W. F. Yeung, Q. Qu, Y. Y. Chung, X. Chen, and Z. Chen, “Improved image classification with 4D light-field and interleaved convolutional neural network,” Tools and Applications, vol. 78, no. 20, pp. 29 211–29 227, Oct. 2019.
- K. Yücer, A. Sorkine-Hornung, O. Wang, and O. Sorkine-Hornung, “Efficient 3d object segmentation from densely sampled light fields with applications to 3d reconstruction,” ACM Transactions on Graphics (TOG), vol. 35, no. 3, p. 22, 2016.
- S. Heber, W. Yu, and T. Pock, “Neural EPI-Volume Networks for Shape from Light Field,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2271–2279.
- T.-C. Wang, A. A. Efros, and R. Ramamoorthi, “Occlusion-aware depth estimation using light-field cameras,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3487–3495.
- W. Chao, X. Wang, Y. Wang, G. Wang, and F. Duan, “Learning sub-pixel disparity distribution for light field depth estimation,” IEEE Transactions on Computational Imaging, vol. 9, pp. 1126–1138, 2023.
- Y. Ding, Z. Chen, Y. Ji, J. Yu, and J. Ye, “Light field-based underwater 3d reconstruction via angular resampling,” IEEE Transactions on Computational Imaging, 2023.
- H. Sheng, S. Zhang, X. Liu, and Z. Xiong, “Relative location for light field saliency detection,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 1631–1635.
- M. Zhang, W. Ji, Y. Piao, J. Li, Y. Zhang, S. Xu, and H. Lu, “Lfnet: Light field fusion network for salient object detection,” IEEE Transactions on Image Processing, vol. 29, pp. 6276–6287, 2020.
- G. Chen, H. Fu, T. Zhou, G. Xiao, K. Fu, Y. Xia, and Y. Zhang, “Fusion-embedding siamese network for light field salient object detection,” IEEE Transactions on Multimedia, 2023.
- H. Verinaz-Jadan, P. Song, C. L. Howe, A. J. Foust, and P. L. Dragotti, “Shift-invariant-subspace discretization and volume reconstruction for light field microscopy,” IEEE Transactions on Computational Imaging, vol. 8, pp. 286–301, 2022.
- H. Verinaz-Jadan, C. L. Howe, P. Song, F. Lesept, J. Kittler, A. J. Foust, and P. L. Dragotti, “Physics-based deep learning for imaging neuronal activity via two-photon and light field microscopy,” IEEE Transactions on Computational Imaging, 2023.
- M. Levoy, R. Ng, A. Adams, M. Footer, and M. Horowitz, “Light field microscopy,” in Acm Siggraph 2006 Papers, 2006, pp. 924–934.
- R. Raghavendra, K. B. Raja, and C. Busch, “Presentation attack detection for face recognition using light field camera,” IEEE Transactions on Image Processing, vol. 24, no. 3, pp. 1060–1075, 2015.
- Z. Ji, H. Zhu, and Q. Wang, “LFHOG: A discriminative descriptor for live face detection from light field image,” in 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2016, pp. 1474–1478.
- B. Wilburn, N. Joshi, V. Vaish, M. Levoy, and M. Horowitz, “High-speed videography using a dense camera array,” in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., vol. 2. IEEE, 2004, pp. II–II.
- B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” in ACM Transactions on Graphics (TOG), vol. 24, no. 3. ACM, 2005, pp. 765–776.
- Raytrix, “3d light field camera technology,” https://raytrix.de/, 2020, accessed: 2023-06-10.
- Wikipedia contributors, “Lytro — Wikipedia, the free encyclopedia,” https://w.wiki/7G9s, 2020, accessed: 2023-06-10.
- P. Debevec, “Experimenting with light fields,” https://blog.google/products/google-ar-vr/experimenting-light-fields/, 2018, accessed: 2023-06-10.
- H. W. F. Yeung, J. Hou, X. Chen, J. Chen, Z. Chen, and Y. Y. Chung, “Light Field Spatial Super-Resolution Using Deep Efficient Spatial-Angular Separable Convolution,” IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2319–2330, 2019.
- Y. Wang, L. Wang, G. Wu, J. Yang, W. An, J. Yu, and Y. Guo, “Disentangling light fields for super-resolution and disparity estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- G. Liu, H. Yue, J. Wu, and J. Yang, “Efficient light field angular super-resolution with sub-aperture feature learning and macro-pixel upsampling,” IEEE Transactions on Multimedia, 2022.
- H. W. F. Yeung, J. Hou, J. Chen, Y. Y. Chung, and X. Chen, “Fast light field reconstruction with deep coarse-to-fine modeling of spatial-angular clues,” in The European Conference on Computer Vision (ECCV), Sep. 2018, pp. 137–152.
- Y. Yoon, H.-G. Jeon, D. Yoo, J.-Y. Lee, and I. S. Kweon, “Light-field image super-resolution using convolutional neural network,” IEEE Signal Processing Letters, vol. 24, no. 6, pp. 848–852, 2017.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” 2023.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 012–10 022.
- Z. Lu, J. Li, H. Liu, C. Huang, L. Zhang, and T. Zeng, “Transformer for single image super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 457–466.
- S. Wanner and B. Goldluecke, “Variational light field analysis for disparity estimation and super-resolution,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 3, pp. 606–619, 2014.
- S. Wang, T. Zhou, Y. Lu, and H. Di, “Detail preserving transformer for light field image super-resolution,” in Proc. AAAI Conf. Artif. Intell., 2022.
- Z. Liang, Y. Wang, L. Wang, J. Yang, and S. Zhou, “Light field image super-resolution with transformers,” IEEE Signal Processing Letters, vol. 29, pp. 563–567, 2022.
- Z. Liang, Y. Wang, L. Wang, J. Yang, S. Zhou, and Y. Guo, “Learning Non-Local Spatial-Angular Correlation for Light Field Image Super-Resolution,” 2023.
- J. Gu and C. Dong, “Interpreting Super-Resolution Networks with Local Attribution Maps,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9199–9208.
- C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in European Conference on Computer Vision, vol. 8689, 2014, pp. 184–199.
- J. Kim, J. K. Lee, and K. M. Lee, “Accurate Image Super-Resolution Using Very Deep Convolutional Networks,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1646–1654.
- Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2472–2481.
- Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in European Conference on Computer Vision, 2018, pp. 286–301.
- A. Esmaeilzehi, M. O. Ahmad, and M. Swamy, “Srnssi: a deep light-weight network for single image super resolution using spatial and spectral information,” IEEE Transactions on Computational Imaging, vol. 7, pp. 409–421, 2021.
- R. Wu, T. Yang, L. Sun, Z. Zhang, S. Li, and L. Zhang, “SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution,” Nov. 2023.
- C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4681–4690.
- M. S. M. Sajjadi, B. Schölkopf, and M. Hirsch, “EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis,” in IEEE International Conference on Computer Vision, 2017, pp. 4491–4500.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- H. Chen, Y. Wang, T. Guo, C. Xu, Y. Deng, Z. Liu, S. Ma, C. Xu, C. Xu, and W. Gao, “Pre-trained image processing transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12 299–12 310.
- J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “Swinir: Image restoration using swin transformer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1833–1844.
- X. Chen, X. Wang, J. Zhou, Y. Qiao, and C. Dong, “Activating more pixels in image super-resolution transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 367–22 377.
- X. Zhang, H. Zeng, S. Guo, and L. Zhang, “Efficient long-range attention network for image super-resolution,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVII. Springer, 2022, pp. 649–667.
- J. Jin, J. Hou, J. Chen, and S. Kwong, “Light field spatial super-resolution via deep combinatorial geometry embedding and structural consistency regularization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2260–2269.
- Y. Wang, L. Wang, J. Yang, W. An, J. Yu, and Y. Guo, “Spatial-angular interaction for light field image super-resolution,” in European Conference on Computer Vision. Springer, 2020, pp. 290–308.
- G. Liu, H. Yue, J. Wu, and J. Yang, “Intra-Inter View Interaction Network for Light Field Image Super-Resolution,” IEEE Transactions on Multimedia, vol. 25, pp. 256–266, 2021.
- Y. Chen, G. Jiang, Z. Jiang, M. Yu, and Y.-S. Ho, “Deep light field super-resolution using frequency domain analysis and semantic prior,” IEEE Transactions on Multimedia, vol. 24, pp. 3722–3737, 2021.
- Y. Sun, L. Li, Z. Li, S. Wang, S. Liu, and G. Li, “Learning a compact spatial-angular representation for light field,” IEEE Transactions on Multimedia, 2022.
- Z. Hu, X. Chen, H. W. F. Yeung, Y. Y. Chung, and Z. Chen, “Texture-Enhanced Light Field Super-Resolution With Spatio-Angular Decomposition Kernels,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–16, 2022.
- M. Rossi and P. Frossard, “Geometry-consistent light field super-resolution via graph-based regularization,” IEEE Transactions on Image Processing, vol. 27, no. 9, pp. 4207–4218, 2018.
- V. K. Ghassab and N. Bouguila, “Light field super-resolution using edge-preserved graph-based regularization,” IEEE Transactions on Multimedia, vol. 22, no. 6, pp. 1447–1457, 2019.
- X. Wang, Z. Wang, W. Huang, K. Chen, and L. Li, “Boosting Light Field Image Super Resolution Learnt From Single-Image Prior,” IEEE Transactions on Computational Imaging, 2023.
- Z. Xiao, Y. Liu, R. Gao, and Z. Xiong, “Cutmib: Boosting light field super-resolution via multi-view image blending,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1672–1682.
- X. Chu, Z. Tian, B. Zhang, X. Wang, X. Wei, H. Xia, and C. Shen, “Conditional Positional Encodings for Vision Transformers,” 2021.
- “Basiclfsr: Open source light field toolbox for super-resolution,” https://github.com/ZhengyuLiang24/BasicLFSR, 2023, accessed: 2023-06-10.
- M. Rerábek and T. Ebrahimi, “New Light Field Image Dataset,” 8th International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–2, 2016.
- K. Honauer, O. Johannsen, D. Kondermann, and B. Goldluecke, “A dataset and evaluation methodology for depth estimation on 4d light fields,” in Asian Conference on Computer Vision. Springer, 2016, pp. 19–34.
- S. Wanner, S. Meister, and B. Goldluecke, “Datasets and benchmarks for densely sampled 4D light fields.” in Vision, Modelling and Visualization (VMV), vol. 13, 2013, pp. 225–226.
- M. Le Pendu, X. Jiang, and C. Guillemot, “Light field inpainting propagation via low rank matrix completion,” IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 1981–1993, 2018.
- V. Vaish and A. Adams, “The (new) stanford light field archive,” Computer Graphics Laboratory, Stanford University, vol. 6, no. 7, p. 3, 2008.
- B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 136–144.
- Y. Wang, L. Wang, Z. Liang, J. Yang, W. An, and Y. Guo, “Occlusion-Aware Cost Constructor for Light Field Depth Estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 19 809–19 818.