MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D (2411.02336v1)
Abstract: Texturing is a crucial step in the 3D asset production workflow, which enhances the visual appeal and diversity of 3D assets. Despite recent advancements in Text-to-Texture (T2T) generation, existing methods often yield subpar results, primarily due to local discontinuities, inconsistencies across multiple views, and their heavy dependence on UV unwrapping outcomes. To tackle these challenges, we propose a novel generation-refinement 3D texturing framework called MVPaint, which can generate high-resolution, seamless textures while emphasizing multi-view consistency. MVPaint mainly consists of three key modules. 1) Synchronized Multi-view Generation (SMG). Given a 3D mesh model, MVPaint first simultaneously generates multi-view images by employing an SMG model, which leads to coarse texturing results with unpainted parts due to missing observations. 2) Spatial-aware 3D Inpainting (S3I). To ensure complete 3D texturing, we introduce the S3I method, specifically designed to effectively texture previously unobserved areas. 3) UV Refinement (UVR). Furthermore, MVPaint employs a UVR module to improve the texture quality in the UV space, which first performs a UV-space Super-Resolution, followed by a Spatial-aware Seam-Smoothing algorithm for revising spatial texturing discontinuities caused by UV unwrapping. Moreover, we establish two T2T evaluation benchmarks: the Objaverse T2T benchmark and the GSO T2T benchmark, based on selected high-quality 3D meshes from the Objaverse dataset and the entire GSO dataset, respectively. Extensive experimental results demonstrate that MVPaint surpasses existing state-of-the-art methods. Notably, MVPaint could generate high-fidelity textures with minimal Janus issues and highly enhanced cross-view consistency.
- Gpt-4 technical report. In arXiv preprint arXiv:2303.08774, 2023.
- Stability AI. Stable diffusion 2, 2022. Accessed: 2024-10-24.
- Multidiffusion: fusing diffusion paths for controlled image generation. In ICML, 2023.
- Meta 3d texturegen: Fast and consistent texture generation for 3d objects. In arXiv preprint arXiv:2407.02430, 2024.
- Demystifying mmd gans. In ICLR, 2018.
- Text2tex: Text-driven texture synthesis via diffusion models. In CVPR, 2023.
- Meshxl: Neural coordinate field for generative 3d foundation models. In arXiv preprint arXiv:2405.20853, 2024a.
- Meshanything: Artist-created mesh generation with autoregressive transformers. In arXiv preprint arXiv:2406.10163, 2024b.
- Meshanything v2: Artist-created mesh generation with adjacent mesh tokenization. In arXiv preprint arXiv:2408.02555, 2024c.
- It3d: Improved text-to-3d generation with explicit view synthesis. In AAAI, 2024d.
- Emu: Enhancing image generation models using photogenic needles in a haystack. In arXiv preprint arXiv:2309.15807, 2023.
- Objaverse: A universe of annotated 3d objects. In CVPR, 2023.
- Flashtex: Fast relightable mesh texturing with lightcontrolnet. In ECCV, 2024.
- Diffusers. Controlnet depth sdxl 1.0. https://huggingface.co/diffusers/controlnet-depth-sdxl-1.0, 2023.
- Cogview: Mastering text-to-image generation via transformers. In NeurIPS, 2021.
- Google scanned objects: A high-quality dataset of 3d scanned household items. In ICRA, 2022.
- Make-a-scene: Scene-based text-to-image generation with human priors. In ECCV, 2022.
- Generative adversarial nets. In NeurIPS, 2014.
- CLIPScore: a reference-free evaluation metric for image captioning. In EMNLP, 2021.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, 2017.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- Cascaded diffusion models for high fidelity image generation. In JMLR, 2022.
- Cogvlm2: Visual language models for image and video understanding. In arXiv preprint arXiv:2408.16500, 2024.
- Lora: Low-rank adaptation of large language models. In ICML, 2022.
- Adversarial texture optimization from rgb-d scans. In CVPR, 2020.
- Mistral 7b. In arXiv preprint arXiv:2310.06825, 2023.
- Flexitex: Enhancing texture generation with visual guidance. In arXiv preprint arXiv:2409.12431, 2024.
- Direct visibility of point sets. In SIGGRAPH, 2007.
- Solid texture synthesis from 2d exemplars. In SIGGRAPH, 2007.
- Syncdiffusion: Coherent montage via synchronized joint diffusions. In NeurIPS, 2023.
- Appearance-space texture synthesis. In ACM TOG, 2006.
- Mvcontrol: Adding conditional control to multi-view diffusion for controllable text-to-3d generation. In arXiv preprint arXiv:2311.14494, 2023.
- Zero-1-to-3: Zero-shot one image to 3d object. In ICCV, 2023a.
- Text-guided texturing by synchronized multi-view diffusion. In arXiv preprint arXiv:2311.12891, 2023b.
- Wonder3d: Single image to 3d using cross-domain diffusion. In CVPR, 2024.
- Mehdi Mirza. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
- Clip-mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia, 2022.
- Easi-tex: Edge-aware mesh texturing from single image. In ACM TOG, 2024.
- Sdxl: Improving latent diffusion models for high-resolution image synthesis. In arXiv preprint arXiv:2307.01952, 2023.
- Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023.
- Alec Radford. Unsupervised representation learning with deep convolutional generative adversarial networks. In arXiv preprint arXiv:1511.06434, 2015.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Texture: Text-guided texturing of 3d shapes. In SIGGRAPH, 2023.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Clip-forge: Towards zero-shot text-to-shape generation. In CVPR, 2022.
- Zero123++: a single image to consistent multi-view diffusion base model. In arXiv preprint arXiv:2310.15110, 2023.
- Mvdream: Multi-view diffusion for 3d generation. In ICLR, 2024.
- Denoising diffusion implicit models. In ICLR, 2021.
- Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. In ICLR, 2024.
- Greg Turk. Texture synthesis on surfaces. In SIGGRAPH, 2001.
- A Vaswani. Attention is all you need. In NeurIPS, 2017.
- Imagedream: Image-prompt multi-view diffusion for 3d generation. In arXiv preprint arXiv:2312.02201, 2023.
- Texture synthesis over arbitrary manifold surfaces. In SIGGRAPH, 2001.
- State of the art in example-based texture synthesis. In Eurographics STAR, 2009.
- Unique3d: High-quality and efficient 3d mesh generation from a single image. In arXiv preprint arXiv:2405.20343, 2024.
- Xinsir. Controlnet tile sdxl 1.0. https://huggingface.co/xinsir/controlnet-tile-sdxl-1.0, 2023.
- Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. In arXiv preprint arXiv:2308.06721, 2023.
- Consistent-1-to-3: Consistent image to 3d view synthesis via geometry-aware diffusion models. In 3DV, 2024.
- Jonathan Young. xatlas: A Library for Mesh Parameterization. GitHub repository, 2018.
- Paint3d: Paint anything 3d with lighting-less texture diffusion models. In CVPR, 2024.
- Texpainter: Generative mesh texturing with multi-view consistency. In SIGGRAPH, 2024a.
- Adding conditional control to text-to-image diffusion models. In CVPR, 2023.
- Clay: A controllable large-scale generative model for creating high-quality 3d assets. In ACM TOG, 2024b.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.