T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image (2403.13663v1)
Abstract: Pixel2Mesh (P2M) is a classical approach for reconstructing 3D shapes from a single color image through coarse-to-fine mesh deformation. Although P2M is capable of generating plausible global shapes, its Graph Convolution Network (GCN) often produces overly smooth results, causing the loss of fine-grained geometry details. Moreover, P2M generates non-credible features for occluded regions and struggles with the domain gap from synthetic data to real-world images, which is a common challenge for single-view 3D reconstruction methods. To address these challenges, we propose a novel Transformer-boosted architecture, named T-Pixel2Mesh, inspired by the coarse-to-fine approach of P2M. Specifically, we use a global Transformer to control the holistic shape and a local Transformer to progressively refine the local geometry details with graph-based point upsampling. To enhance real-world reconstruction, we present the simple yet effective Linear Scale Search (LSS), which serves as prompt tuning during the input preprocessing. Our experiments on ShapeNet demonstrate state-of-the-art performance, while results on real-world data show the generalization capability.
- “Occupancy networks: Learning 3d reconstruction in function space,” in Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019.
- “Frustum pointnets for 3d object detection from RGB-D data,” in CVPR, 2018.
- “Pixel2mesh++: 3d mesh generation and refinement from multi-view images,” arXiv preprint arXiv:2204.09866, 2022.
- “3d-r2n2: A unified approach for single and multi-view 3d object reconstruction,” in ECCV, 2016.
- “O-cnn: Octree-based convolutional neural networks for 3d shape analysis,” ACM Transactions on Graphics (TOG), vol. 36, no. 4, pp. 72, 2017.
- “Disn: Deep implicit surface network for high-quality single-view 3d reconstruction,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- “D2im-net: Learning detail disentangled implicit fields from single images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10246–10255.
- “A point set generation network for 3d object reconstruction from a single image,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 605–613.
- “3d shape reconstruction from 2d images with disentangled attribute flow,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 3803–3813.
- “Pixel2mesh: Generating 3d mesh models from single rgb images,” in ECCV, 2018.
- “The power of scale for parameter-efficient prompt tuning,” arXiv preprint arXiv:2104.08691, 2021.
- “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- “Mesh graphormer,” in ICCV, 2021.
- “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
- “Point transformer,” in CVPR, 2020.
- “Exploring self-attention for image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10076–10085.
- “Snowflakenet: Point cloud completion by snowflake point deconvolution with skip-transformer,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 5499–5509.
- “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
- “Atlasnet: A papier-ma^^𝑎\hat{a}over^ start_ARG italic_a end_ARGche´´𝑒\acute{e}over´ start_ARG italic_e end_ARG approach to learning 3d surface generation,” arXiv preprint arXiv:1802.05384, 2018.
- “Pix3d: Dataset and methods for single-image 3d shape modeling,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2974–2983.
- “Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10901–10911.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.