Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 80 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 117 tok/s Pro
Kimi K2 176 tok/s Pro
GPT OSS 120B 457 tok/s Pro
Claude Sonnet 4.5 32 tok/s Pro
2000 character limit reached

T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image (2403.13663v1)

Published 20 Mar 2024 in cs.CV

Abstract: Pixel2Mesh (P2M) is a classical approach for reconstructing 3D shapes from a single color image through coarse-to-fine mesh deformation. Although P2M is capable of generating plausible global shapes, its Graph Convolution Network (GCN) often produces overly smooth results, causing the loss of fine-grained geometry details. Moreover, P2M generates non-credible features for occluded regions and struggles with the domain gap from synthetic data to real-world images, which is a common challenge for single-view 3D reconstruction methods. To address these challenges, we propose a novel Transformer-boosted architecture, named T-Pixel2Mesh, inspired by the coarse-to-fine approach of P2M. Specifically, we use a global Transformer to control the holistic shape and a local Transformer to progressively refine the local geometry details with graph-based point upsampling. To enhance real-world reconstruction, we present the simple yet effective Linear Scale Search (LSS), which serves as prompt tuning during the input preprocessing. Our experiments on ShapeNet demonstrate state-of-the-art performance, while results on real-world data show the generalization capability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. “Occupancy networks: Learning 3d reconstruction in function space,” in Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019.
  2. “Frustum pointnets for 3d object detection from RGB-D data,” in CVPR, 2018.
  3. “Pixel2mesh++: 3d mesh generation and refinement from multi-view images,” arXiv preprint arXiv:2204.09866, 2022.
  4. “3d-r2n2: A unified approach for single and multi-view 3d object reconstruction,” in ECCV, 2016.
  5. “O-cnn: Octree-based convolutional neural networks for 3d shape analysis,” ACM Transactions on Graphics (TOG), vol. 36, no. 4, pp. 72, 2017.
  6. “Disn: Deep implicit surface network for high-quality single-view 3d reconstruction,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  7. “D2im-net: Learning detail disentangled implicit fields from single images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10246–10255.
  8. “A point set generation network for 3d object reconstruction from a single image,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 605–613.
  9. “3d shape reconstruction from 2d images with disentangled attribute flow,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 3803–3813.
  10. “Pixel2mesh: Generating 3d mesh models from single rgb images,” in ECCV, 2018.
  11. “The power of scale for parameter-efficient prompt tuning,” arXiv preprint arXiv:2104.08691, 2021.
  12. “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  13. “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  14. “Mesh graphormer,” in ICCV, 2021.
  15. “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
  16. “Point transformer,” in CVPR, 2020.
  17. “Exploring self-attention for image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10076–10085.
  18. “Snowflakenet: Point cloud completion by snowflake point deconvolution with skip-transformer,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 5499–5509.
  19. “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
  20. “Atlasnet: A papier-ma^^𝑎\hat{a}over^ start_ARG italic_a end_ARGche´´𝑒\acute{e}over´ start_ARG italic_e end_ARG approach to learning 3d surface generation,” arXiv preprint arXiv:1802.05384, 2018.
  21. “Pix3d: Dataset and methods for single-image 3d shape modeling,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2974–2983.
  22. “Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10901–10911.
Citations (9)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 1 like.