Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

R3D-SWIN:Use Shifted Window Attention for Single-View 3D Reconstruction (2312.02725v3)

Published 5 Dec 2023 in cs.CV

Abstract: Recently, vision transformers have performed well in various computer vision tasks, including voxel 3D reconstruction. However, the windows of the vision transformer are not multi-scale, and there is no connection between the windows, which limits the accuracy of voxel 3D reconstruction. Therefore, we propose a voxel 3D reconstruction network based on shifted window attention. To the best of our knowledge, this is the first work to apply shifted window attention to voxel 3D reconstruction. Experimental results on ShapeNet verify our method achieves SOTA accuracy in single-view reconstruction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese, “3d-r2n2: A unified approach for single and multi-view 3d object reconstruction,” in Proc. Eur. Conf. Comput. Vis.,2016, pp. 628–644.
  2. A. Kar, C. Hane, and J. Malik.“Learning a multi-view stereo machine,” in Proc. Adv. Neural Inf. Process. Syst.,2017,vol.30.
  3. H. Xie, H. Yao, X. Sun, S. Zhou, and S. Zhang. “Pix2vox: Context-aware 3d reconstruction from single and multi-view images,” in Proc. IEEE/CVF Int. Conf. Comput. Vis.,2019, pp. 2690–2698.
  4. H. Xie, H. Yao, S. Zhang, S. Zhou, and W. Sun. “Pix2vox++: Multi-scale contextaware 3d object reconstruction from single and multiple images,” Int. J. Comput. Vis., vol.128,pp. 2919–2935, 2020
  5. Z Zhu, L Yang, X Lin, L Yang, and Yanya Liang. “Garnet: Global-aware multi-view 3d reconstruction network and the cost-performance tradeoff,”P. R., 142:109674, 2023.
  6. D. Wang, X. Cui, X. Chen, Z. Zou, T. Shi, S. Salcudean, Z. J. Wang, and R. Ward. “Multi-view 3d reconstruction with transformers,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 5722–5731.
  7. F. Yagubbayli, A. Tonioni, and F. Tombari.“Legoformer: Transformers for block-by-block multi-view 3d reconstruction.” 2021 arXiv preprint arXiv:2106.12102.
  8. Z. Shi, Z. Meng, Y. Xing, YunpuM, and RogerWattenhofer. “3d-retr: End-to-end single and multi-view 3d reconstruction with transformers.” in Proc. B. M. V. C. 2021.
  9. Z. Zhu, L. Yang, N. Li, C. Jiang, and Y. Liang. “Umiformer: Mining the correlations between similar tokens for multi-view 3d reconstruction.” 2023 arXiv preprint arXiv:2302.13987.
  10. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei,Z. Zhang, S. Lin, and B. Guo. “Swin transformer: Hierarchical vision transformer using shifted windows.” 2021 arXiv preprint arXiv:2103.14030.
  11. X. Sun, J. Wu, X. Zhang, Z.Zhang, C. Zhang, T. Xue, J. B. Tenenbaum, and W. T. Freeman. “Pix3d: Dataset and methods for single-image 3d shape modeling.”in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , pp. 2974–2983, 2018.
  12. F. Milletari, N. Navab, and S. Ahmadi.“V-net: Fully convolutional neural networks for volumetric medical image segmentation.” in Proc. IEEE Conf. Inter. Con. 3D. Vis., pp. 565–571, 2016.
  13. I. Loshchilov and F. Hutter. “Decoupled weight decay regularization.” in Porc. Int. Conf. Learn. Representa. 2019.

Summary

We haven't generated a summary for this paper yet.