Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UNeR3D: Versatile and Scalable 3D RGB Point Cloud Generation from 2D Images in Unsupervised Reconstruction (2312.06706v1)

Published 10 Dec 2023 in cs.CV

Abstract: In the realm of 3D reconstruction from 2D images, a persisting challenge is to achieve high-precision reconstructions devoid of 3D Ground Truth data reliance. We present UNeR3D, a pioneering unsupervised methodology that sets a new standard for generating detailed 3D reconstructions solely from 2D views. Our model significantly cuts down the training costs tied to supervised approaches and introduces RGB coloration to 3D point clouds, enriching the visual experience. Employing an inverse distance weighting technique for color rendering, UNeR3D ensures seamless color transitions, enhancing visual fidelity. Our model's flexible architecture supports training with any number of views, and uniquely, it is not constrained by the number of views used during training when performing reconstructions. It can infer with an arbitrary count of views during inference, offering unparalleled versatility. Additionally, the model's continuous spatial input domain allows the generation of point clouds at any desired resolution, empowering the creation of high-resolution 3D RGB point clouds. We solidify the reconstruction process with a novel multi-view geometric loss and color loss, demonstrating that our model excels with single-view inputs and beyond, thus reshaping the paradigm of unsupervised learning in 3D vision. Our contributions signal a substantial leap forward in 3D vision, offering new horizons for content creation across diverse applications. Code is available at https://github.com/HongbinLin3589/UNeR3D.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, 120:153–168, 2016.
  2. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  3. A survey of surface reconstruction from point clouds. In Computer graphics forum, pages 301–329. Wiley Online Library, 2017.
  4. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pages 628–644. Springer, 2016.
  5. Shape completion using 3d-encoder-predictor cnns and shape synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5868–5877, 2017.
  6. Fusion4d: Real-time performance capture of challenging scenes. ACM Transactions on Graphics (ToG), 35(4):1–13, 2016.
  7. Multi-view stereo: A tutorial. Foundations and Trends® in Computer Graphics and Vision, 9(1-2):1–148, 2015.
  8. Ref-neus: Ambiguity-reduced neural implicit surface learning for multi-view reconstruction with reflection. arXiv preprint arXiv:2303.10840, 2023.
  9. Learning a predictable and generative vector representation for objects. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14, pages 484–499. Springer, 2016.
  10. Advanced medical use of three-dimensional imaging in congenital heart disease: augmented reality, mixed reality, virtual reality, and three-dimensional printing. Korean journal of radiology, 21(2):133–145, 2020.
  11. Image-based 3d object reconstruction: State-of-the-art and trends in the deep learning era. IEEE transactions on pattern analysis and machine intelligence, 43(5):1578–1604, 2019.
  12. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  13. Unsupervised learning of 3d object categories from videos in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4700–4709, 2021.
  14. An introduction to image-based 3d surface reconstruction and a survey of photometric stereo methods. 3D Research, 2(3):1–17, 2011.
  15. Avatarclip: Zero-shot text-driven generation and animation of 3d avatars. arXiv preprint arXiv:2205.08535, 2022.
  16. Leveraging relational graph neural network for transductive model ensemble. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 775–787, 2023.
  17. Zero-shot text-guided object generation with dream fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 867–876, 2022.
  18. Robotic perception of transparent objects: A review. IEEE Transactions on Artificial Intelligence, 2023.
  19. Gal: Geometric adversarial loss for single-view 3d-object reconstruction. In Proceedings of the European conference on computer vision (ECCV), pages 802–816, 2018.
  20. Neural 3d mesh renderer. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3907–3916, 2018.
  21. Adv3d: Generating 3d adversarial examples in driving scenarios with nerf. arXiv preprint arXiv:2309.01351, 2023.
  22. A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88, 2017.
  23. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. arXiv preprint arXiv:2306.16928, 2023.
  24. Dense 3d point cloud reconstruction using a deep pyramid network. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1052–1060. IEEE, 2019.
  25. Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 922–928. IEEE, 2015.
  26. Neat: Learning neural implicit surfaces with arbitrary topologies from multi-view images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 248–258, 2023.
  27. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4460–4470, 2019.
  28. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  29. Capnet: Continuous approximation projection for 3d point cloud reconstruction using 2d supervision. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8819–8826, 2019.
  30. Kinectfusion: Real-time dense surface mapping and tracking. In 2011 10th IEEE international symposium on mixed and augmented reality, pages 127–136. Ieee, 2011.
  31. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
  32. Image2mesh: A learning framework for single image 3d reconstruction. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part I 14, pages 365–381. Springer, 2019.
  33. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017a.
  34. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017b.
  35. Octnet: Learning deep 3d representations at high resolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3577–3586, 2017.
  36. The earth mover’s distance as a metric for image retrieval. International journal of computer vision, 40:99–121, 2000.
  37. Efficient variants of the icp algorithm. In Proceedings third international conference on 3-D digital imaging and modeling, pages 145–152. IEEE, 2001.
  38. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4104–4113, 2016.
  39. Indoor segmentation and support inference from rgbd images. In Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12, pages 746–760. Springer, 2012.
  40. Text-to-4d dynamic scene generation. arXiv preprint arXiv:2301.11280, 2023.
  41. Scene representation networks: Continuous 3d-structure-aware neural scene representations. Advances in Neural Information Processing Systems, 32, 2019.
  42. Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems, 33:7537–7547, 2020.
  43. Sebastian Thrun. Probabilistic robotics. Communications of the ACM, 45(3):52–57, 2002.
  44. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  45. Mvpnet: Multi-view point regression networks for 3d object reconstruction from a single image. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8949–8956, 2019.
  46. Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the European conference on computer vision (ECCV), pages 52–67, 2018a.
  47. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021.
  48. Adaptive o-cnn: A patch-based deep representation of 3d shapes. ACM Transactions on Graphics (TOG), 37(6):1–11, 2018b.
  49. Shape inpainting using 3d generative adversarial network and recurrent convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 2298–2306, 2017.
  50. Hf-neus: Improved surface reconstruction using high-frequency details. Advances in Neural Information Processing Systems, 35:1966–1978, 2022.
  51. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  52. Unsupervised learning of probably symmetric deformable 3d objects from images in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1–10, 2020.
  53. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015.
  54. Pix2vox: Context-aware 3d reconstruction from single and multi-view images. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2690–2698, 2019.
  55. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5438–5448, 2022.
  56. Perspective transformer nets: Learning single-view 3d object reconstruction without 3d supervision. Advances in neural information processing systems, 29, 2016.
  57. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European conference on computer vision (ECCV), pages 767–783, 2018.
  58. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021a.
  59. Pointr: Diverse point cloud completion with geometry-aware transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12498–12507, 2021b.
  60. Parameter is not all you need: Starting from non-parametric networks for 3d point cloud analysis. arXiv preprint arXiv:2303.08134, 2023.
  61. Deep review and analysis of recent nerfs. APSIPA Transactions on Signal and Information Processing, 12(1), 2023.

Summary

We haven't generated a summary for this paper yet.