2000 character limit reached
R3D-SWIN:Use Shifted Window Attention for Single-View 3D Reconstruction (2312.02725v3)
Published 5 Dec 2023 in cs.CV
Abstract: Recently, vision transformers have performed well in various computer vision tasks, including voxel 3D reconstruction. However, the windows of the vision transformer are not multi-scale, and there is no connection between the windows, which limits the accuracy of voxel 3D reconstruction. Therefore, we propose a voxel 3D reconstruction network based on shifted window attention. To the best of our knowledge, this is the first work to apply shifted window attention to voxel 3D reconstruction. Experimental results on ShapeNet verify our method achieves SOTA accuracy in single-view reconstruction.
- C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese, “3d-r2n2: A unified approach for single and multi-view 3d object reconstruction,” in Proc. Eur. Conf. Comput. Vis.,2016, pp. 628–644.
- A. Kar, C. Hane, and J. Malik.“Learning a multi-view stereo machine,” in Proc. Adv. Neural Inf. Process. Syst.,2017,vol.30.
- H. Xie, H. Yao, X. Sun, S. Zhou, and S. Zhang. “Pix2vox: Context-aware 3d reconstruction from single and multi-view images,” in Proc. IEEE/CVF Int. Conf. Comput. Vis.,2019, pp. 2690–2698.
- H. Xie, H. Yao, S. Zhang, S. Zhou, and W. Sun. “Pix2vox++: Multi-scale contextaware 3d object reconstruction from single and multiple images,” Int. J. Comput. Vis., vol.128,pp. 2919–2935, 2020
- Z Zhu, L Yang, X Lin, L Yang, and Yanya Liang. “Garnet: Global-aware multi-view 3d reconstruction network and the cost-performance tradeoff,”P. R., 142:109674, 2023.
- D. Wang, X. Cui, X. Chen, Z. Zou, T. Shi, S. Salcudean, Z. J. Wang, and R. Ward. “Multi-view 3d reconstruction with transformers,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 5722–5731.
- F. Yagubbayli, A. Tonioni, and F. Tombari.“Legoformer: Transformers for block-by-block multi-view 3d reconstruction.” 2021 arXiv preprint arXiv:2106.12102.
- Z. Shi, Z. Meng, Y. Xing, YunpuM, and RogerWattenhofer. “3d-retr: End-to-end single and multi-view 3d reconstruction with transformers.” in Proc. B. M. V. C. 2021.
- Z. Zhu, L. Yang, N. Li, C. Jiang, and Y. Liang. “Umiformer: Mining the correlations between similar tokens for multi-view 3d reconstruction.” 2023 arXiv preprint arXiv:2302.13987.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei,Z. Zhang, S. Lin, and B. Guo. “Swin transformer: Hierarchical vision transformer using shifted windows.” 2021 arXiv preprint arXiv:2103.14030.
- X. Sun, J. Wu, X. Zhang, Z.Zhang, C. Zhang, T. Xue, J. B. Tenenbaum, and W. T. Freeman. “Pix3d: Dataset and methods for single-image 3d shape modeling.”in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , pp. 2974–2983, 2018.
- F. Milletari, N. Navab, and S. Ahmadi.“V-net: Fully convolutional neural networks for volumetric medical image segmentation.” in Proc. IEEE Conf. Inter. Con. 3D. Vis., pp. 565–571, 2016.
- I. Loshchilov and F. Hutter. “Decoupled weight decay regularization.” in Porc. Int. Conf. Learn. Representa. 2019.