SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement (2408.00653v1)
Abstract: We present SF3D, a novel method for rapid and high-quality textured object mesh reconstruction from a single image in just 0.5 seconds. Unlike most existing approaches, SF3D is explicitly trained for mesh generation, incorporating a fast UV unwrapping technique that enables swift texture generation rather than relying on vertex colors. The method also learns to predict material parameters and normal maps to enhance the visual quality of the reconstructed 3D meshes. Furthermore, SF3D integrates a delighting step to effectively remove low-frequency illumination effects, ensuring that the reconstructed meshes can be easily used in novel illumination conditions. Experiments demonstrate the superior performance of SF3D over the existing techniques. Project page: https://stable-fast-3d.github.io
- The perception of shading and reflectance, page 409–424. Cambridge University Press, 1996.
- Stable Video Diffusion: Scaling latent video diffusion models to large datasets. arXiv, 2023a.
- Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models. arXiv, 2023b.
- NeRD: Neural reflectance decomposition from image collections. ICCV, 2021a.
- Neural-pil: Neural pre-integrated lighting for reflectance decomposition. NeurIPS, 2021b.
- SAMURAI: Shape And Material from Unconstrained Real-world Arbitrary Image collections. NeurIPS, 2022.
- Brent Burley. Physically based shading at disney. ACM Transactions on Graphics (SIGGRAPH), 2012.
- Emerging properties in self-supervised vision transformers. ICCV, 2021.
- Efficient geometry-aware 3D generative adversarial networks. In arXiv, 2021.
- Fantasia3D: Disentangling geometry and appearance for high-quality text-to-3D content creation. In ICCV, 2023.
- Objaverse-XL: A universe of 10m+ 3D objects. arXiv, 2023.
- Google Scanned Objects: A high-quality dataset of 3D scanned household items. In 2022 International Conference on Robotics and Automation (ICRA), pages 2553–2560. IEEE, 2022.
- SHINOBI: Shape and Illumination using Neural Object decomposition via Brdf optimization In-the-wild. In CVPR, 2024.
- CAT3D: Create anything in 3D with multi-view diffusion models. arXiv, 2024.
- EMU VIDEO: Factorizing Text-to-Video Generation by Explicit Image Conditioning, 2023.
- threestudio: A unified framework for 3d content generation. https://github.com/threestudio-project/threestudio, 2023.
- Shape, Light & Material Decomposition from Images using Monte Carlo Rendering and Denoising. NeurIPS, 20222.
- OpenLRM: Open-source large reconstruction models. https://github.com/3DTopia/OpenLRM, 2023.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- LRM: Large reconstruction model for single image to 3D. ICLR, 2024.
- ZeroShape: Regression-based zero-shot shape reconstruction. arXiv, 2023.
- Pointinfinity: Resolution-invariant point diffusion models. In CVPR, 2024.
- Real3D: Scaling up large reconstruction models with real-world images. arXiv, 2024.
- 3d gaussian splatting for real-time radiance field rendering. ACM TOG, 42(4), 2023.
- EscherNet: A generative model for scalable view synthesis. arXiv, 2024.
- ViVid-1-to-3: Novel view synthesis with video diffusion models. CVPR, 2024.
- Bruno Levy. geogram. https://github.com/BrunoLevy/geogram, 2024.
- Instant3D: Fast text-to-3D with sparse-view generation and large reconstruction model. arXiv, 2023.
- One-2-3-45++: Fast single image to 3D objects with consistent multi-view generation and 3D diffusion. arXiv, 2023a.
- One-2-3-45: Any single image to 3D mesh in 45 seconds without per-shape optimization. NeurIPS, 2023b.
- Zero-1-to-3: Zero-shot one image to 3D object. ICCV, 2023c.
- SyncDreamer: Generating multiview-consistent images from a single-view image. arXiv, 2023d.
- Unidream: Unifying diffusion priors for relightable text-to-3D generation. arXiv, 2023e.
- Wonder3D: Single image to 3D using cross-domain diffusion. arXiv, 2023.
- Marching cubes: A high resolution 3d surface construction algorithm. ACM Transactions on Graphics (SIGGRAPH), 1987.
- IM-3D: Iterative multiview diffusion and reconstruction for high-quality 3D generation. arXiv, 2024.
- HexaGen3D: Stablediffusion is just one step away from fast and diverse Text-to-3D generation. arXiv, 2024.
- NeRF: Representing scenes as neural radiance fields for view synthesis. ECCV, 2020.
- Extracting Triangular 3D Models, Materials, and Lighting From Images. CVPR, 2022.
- Dinov2: Learning robust visual features without supervision, 2023.
- Dreamfusion: Text-to-3D using 2d diffusion. arXiv, 2022.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
- DreamBooth: Fine tuning text-to-image dissusion models for subject-driven generation. arXiv, 2022.
- Adversarial diffusion distillation. arXiv, 2023.
- Deep Marching Tetrahedra: a hybrid representation for high-resolution 3D shape synthesis. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Zero123++: a single image to consistent multi-view diffusion base model. arXiv, 2023a.
- Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. CVPR, 2016.
- MVDream: Multi-view diffusion for 3d generation. arXiv, 2023b.
- Score-based generative modeling through stochastic differential equations. arXiv, 2020.
- StabilityAI. Stable Zero123, 2023.
- Splatter image: Ultra-fast single-view 3D reconstruction. CVPR, 2024.
- LGM: Large multi-view gaussian model for high-resolution 3D content creation. arXiv, 2024.
- TripoSR: Fast 3D object reconstruction from a single image. arXiv, 2024.
- Collaborative control for geometry-conditioned PBR image generation. arXiv, 2024.
- MCVD: Masked conditional video diffusion for prediction, generation, and interpolation. In NeurIPS, 2022.
- SV3D: Novel multi-view synthesis and 3D generation from a single image using latent video diffusion. arXiv, 2024.
- PF-LRM: Pose-free large reconstruction model for joint pose and shape prediction. arXiv, 2023.
- CRM: Single image to 3D textured mesh with convolutional reconstruction model. arXiv, 2024.
- MeshLRM: Large reconstruction model for high-quality mesh. arXiv, 2024.
- Ouroboros3D: Image-to-3D generation via 3D-aware recursive diffusion. arXiv, 2024.
- Consistent123: Improve consistency for one image to 3D object synthesis. arXiv, 2023.
- Unique3D: High-quality and efficient 3D mesh generation from a single image. arXiv, 2024.
- Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- LRM-Zero: Training large reconstruction models with synthesized data. arXiv, 2024a.
- LDM: Large tensorial SDF model for textured mesh generation. arXiv, 2024b.
- Sv4d: Dynamic 3d content generation with multi-frame and multi-view consistency. arXiv preprint arXiv:2407.17470, 2024c.
- InstantMesh: Efficient 3D mesh generation from a single image with sparse-view large reconstruction models. arXiv, 2024a.
- DMV3D: Denoising multi-view diffusion using 3D large reconstruction model. arXiv, 2023.
- GRM: Large gaussian reconstruction model for efficient 3D reconstruction and generation. arXiv, 2024b.
- Consistent-1-to-3: Consistent image to 3D view synthesis via geometry-aware diffusion models. In 3DV, 2024.
- Jonathan Young. xatlas. https://github.com/jpcy/xatlas, 2024.
- M-LRM: Multi-view large reconstruction model. arXiv, 2024.
- Greg Zaal. Poly haven, 2024. https://polyhaven.com/.
- PhySG: Inverse rendering with spherical Gaussians for physics-based material editing and relighting. CVPR, 2021.
- GS-LRM: Large reconstruction model for 3D gaussian splatting. arXiv, 2024a.
- The unreasonable effectiveness of deep features as a perceptual metric. CVPR, 2018.
- DreamMat: High-quality PBR material generation with geometry- and light-aware diffusion models. arXiv, 2024b.
- FlexiDreamer: Single image-to-3D generation with flexicubes. arXiv, 2024.
- Free3D: Consistent novel view synthesis without 3D representation. arXiv, 2023.
- GTR: Improving large 3D reconstruction models through geometry and texture refinement. arXiv, 2024.
- Triplane meets gaussian splatting: Fast and generalizable single-view 3D reconstruction with transformers. arXiv, 2023.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.