MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets (2404.13923v3)
Abstract: Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture. As a result, material maps optimized by SDS inevitably involve spurious correlated components. The absence of precise material definition makes it infeasible to relight the generated assets reasonably in novel scenes, which limits their application in downstream scenarios. In contrast, humans can effortlessly circumvent this ambiguity by deducing the material of the object from its appearance and semantics. Motivated by this insight, we propose MaterialSeg3D, a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space. We maintain a UV stack, each map of which is unprojected from a specific viewpoint. After traversing all viewpoints, we fuse the stack through a weighted voting scheme and then employ region unification to ensure the coherence of the object parts. To fuel the learning of semantics prior, we collect a material dataset, named Materialized Individual Objects (MIO), which features abundant images, diverse categories, and accurate annotations. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method.
- 3davatargan: Bridging domains for personalized editable avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4552–4562, 2023.
- Learning representations and generative models for 3d point clouds. In International conference on machine learning, pages 40–49. PMLR, 2018.
- AmbientCG. Pbr repository. https://ambientcg.com, 2024.
- Deep svbrdf estimation on real materials. In 2020 International Conference on 3D Vision (3DV), pages 1157–1166. IEEE, 2020.
- Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3762–3769, 2014.
- Opensurfaces: A richly annotated catalog of surface appearance. ACM Transactions on graphics (TOG), 32(4):1–17, 2013.
- Material recognition in the wild with the materials in context database. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3479–3487, 2015.
- Benchmarking in manipulation research: The ycb object and model set and benchmarking protocols. arXiv preprint arXiv:1502.03143, 2015.
- Matatlas: Text-driven consistent geometry texturing and material assignment. arXiv preprint arXiv:2404.02899, 2024.
- pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5799–5809, 2021.
- Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022.
- Text2tex: Text-driven texture synthesis via diffusion models. arXiv preprint arXiv:2303.11396, 2023a.
- Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023b.
- Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5939–5948, 2019.
- MMSegmentation Contributors. MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark. https://github.com/open-mmlab/mmsegmentation, 2020.
- MMEngine Contributors. MMEngine: Openmmlab foundational library for training deep learning models. 2022.
- Objaverse-xl: A universe of 10m+ 3d objects. arXiv preprint arXiv:2307.05663, 2023a.
- Objaverse: A universe of annotated 3d objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13142–13153, 2023b.
- Single-image svbrdf capture with a rendering-aware deep network. ACM Transactions on Graphics (ToG), 37(4):1–15, 2018.
- Guided fine-tuning for large-scale material transfer. In Computer Graphics Forum, pages 91–105. Wiley Online Library, 2020.
- Deep polarization imaging for 3d shape and svbrdf acquisition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15567–15576, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Google scanned objects: A high-quality dataset of 3d scanned household items. In 2022 International Conference on Robotics and Automation (ICRA), pages 2553–2560. IEEE, 2022.
- 3d shape induction from 2d views of multiple objects. In 2017 International Conference on 3D Vision (3DV), pages 402–411. IEEE, 2017.
- Interact with open scenes: A life-long evolution framework for interactive segmentation models. In Proceedings of the 30th ACM International Conference on Multimedia, pages 5688–5697, 2022.
- Deep inverse rendering for high-resolution svbrdf estimation from an arbitrary number of images. ACM Trans. Graph., 38(4):134–1, 2019.
- Get3d: A generative model of high quality 3d textured shapes learned from images. Advances In Neural Information Processing Systems, 35:31841–31854, 2022.
- Stylenerf: A style-based 3d-aware generator for high-resolution image synthesis. arXiv preprint arXiv:2110.08985, 2021.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- Openlrm: Open-source large reconstruction models, 2023.
- Escaping plato’s cave: 3d shape from adversarial rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9984–9993, 2019.
- images.cv. Cv image dataset. https://images.cv, 2024.
- Zero-shot text-guided object generation with dream fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 867–876, 2022.
- The kit object models database: An object model database for object recognition, localization and manipulation in service robotics. The International Journal of Robotics Research, 31(8):927–934, 2012.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Abc: A big cad model dataset for geometric deep learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9601–9611, 2019.
- Modeling surface appearance from a single photograph using self-augmented convolutional neural networks. ACM Transactions on Graphics (ToG), 36(4):1–11, 2017.
- Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023.
- One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. arXiv preprint arXiv:2306.16928, 2023a.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
- A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022.
- Unidream: Unifying diffusion priors for relightable text-to-3d generation. arXiv preprint arXiv:2312.08754, 2023b.
- Wonder3d: Single image to 3d using cross-domain diffusion. arXiv preprint arXiv:2310.15008, 2023.
- Material palette: Extraction of materials from a single image. arXiv preprint arXiv:2311.17060, 2023.
- Fixing weight decay regularization in adam. 2018.
- Inverse graphics gan: Learning to generate 3d shapes from unstructured 2d data. arXiv preprint arXiv:2002.12674, 2020.
- Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
- Materia: Single image high-resolution material capture in the wild. In Computer Graphics Forum, pages 163–177. Wiley Online Library, 2022.
- Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4460–4470, 2019.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Structurenet: Hierarchical graph networks for 3d shape generation. arXiv preprint arXiv:1908.00575, 2019.
- Clip-mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia 2022 conference papers, pages 1–8, 2022.
- Giraffe: Representing scenes as compositional generative neural feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11453–11464, 2021.
- Stylesdf: High-resolution 3d-consistent image and geometry generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13503–13513, 2022.
- Photoshape: Photorealistic materials for large-scale shape collections. arXiv preprint arXiv:1809.09761, 2018.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
- Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Matfusion: a generative diffusion model for svbrdf capture. In SIGGRAPH Asia 2023 Conference Papers, pages 1–10, 2023.
- Bigbird: A large-scale 3d database of object instances. In 2014 IEEE international conference on robotics and automation (ICRA), pages 509–516. IEEE, 2014.
- 3d generation on imagenet. arXiv preprint arXiv:2303.01416, 2023.
- Improved adversarial systems for 3d object generation and reconstruction. In Conference on Robot Learning, pages 87–96. PMLR, 2017.
- High-resolution representations for labeling pixels and regions. arXiv preprint arXiv:1904.04514, 2019.
- Pix3d: Dataset and methods for single-image 3d shape modeling. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2974–2983, 2018.
- Triposr: Fast 3d object reconstruction from a single image. arXiv preprint arXiv:2403.02151, 2024.
- A dense material segmentation dataset for indoor and outdoor scene parsing. In European Conference on Computer Vision, pages 450–466. Springer, 2022.
- Surfacenet: Adversarial svbrdf estimation from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12840–12848, 2021.
- Controlmat: A controlled generative approach to material capture. arXiv preprint arXiv:2309.01700, 2023a.
- Matfuse: Controllable material generation with diffusion models. arXiv preprint arXiv:2308.11408, 2023b.
- Uncertainty-aware pseudo label refinery for domain adaptive semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9092–9101, 2021.
- Informative data mining for one-shot cross-domain semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1064–1074, 2023.
- Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Advances in neural information processing systems, 29, 2016.
- Deepcad: A deep generative network for computer-aided design models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6772–6782, 2021.
- Hyperdreamer: Hyper-realistic 3d content generation and editing from a single image. In SIGGRAPH Asia 2023 Conference Papers, pages 1–10, 2023.
- 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015.
- Unified perceptual parsing for scene understanding. In Proceedings of the European conference on computer vision (ECCV), pages 418–434, 2018.
- Learning descriptor networks for 3d shape synthesis and analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8629–8638, 2018.
- Discoscene: Spatially disentangled generative radiance fields for controllable 3d-aware scene synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4402–4412, 2023.
- Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4541–4550, 2019.
- A large-scale car dataset for fine-grained categorization and verification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3973–3981, 2015.
- Photoscene: Photorealistic material and lighting transfer for indoor scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18562–18571, 2022.
- Diffmat: Latent diffusion models for image-guided material generation. Visual Informatics, 2024.
- Furniscene: A large-scale 3d room dataset with intricate furnishing scenes. arXiv preprint arXiv:2401.03470, 2024.
- Sketch2model: View-aware 3d modeling from single free-hand sketches. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6012–6021, 2021.
- 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5826–5835, 2021.
- Zeyu Li (62 papers)
- Ruitong Gan (2 papers)
- Chuanchen Luo (12 papers)
- Yuxi Wang (49 papers)
- Jiaheng Liu (100 papers)
- Ziwei Zhu Man Zhang (1 paper)
- Qing Li (430 papers)
- Xucheng Yin (4 papers)
- Zhaoxiang Zhang (162 papers)
- Junran Peng (30 papers)