Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model (2403.12010v1)

Published 18 Mar 2024 in cs.CV, cs.AI, and cs.GR

Abstract: Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training, we propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models. Images from video generative models are more suitable for multi-view generation because the underlying network architecture that generates them employs a temporal module to enforce frame consistency. Moreover, the video data sets used to train these models are abundant and diverse, leading to a reduced train-finetuning domain gap. To enhance multi-view consistency, we introduce a 3D-Aware Denoising Sampling, which first employs a feed-forward reconstruction module to get an explicit global 3D model, and then adopts a sampling strategy that effectively involves images rendered from the global 3D model into the denoising sampling loop to improve the multi-view consistency of the final images. As a by-product, this module also provides a fast way to create 3D assets represented by 3D Gaussians within a few seconds. Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches (4 GPU hours versus many thousand GPU hours) with comparable visual quality and consistency. By further fine-tuning, our approach outperforms existing state-of-the-art methods in both quantitative metrics and visual effects. Our project page is aigc3d.github.io/VideoMV.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Qi Zuo (8 papers)
  2. Xiaodong Gu (62 papers)
  3. Lingteng Qiu (18 papers)
  4. Yuan Dong (30 papers)
  5. Zhengyi Zhao (12 papers)
  6. Weihao Yuan (34 papers)
  7. Rui Peng (79 papers)
  8. Siyu Zhu (64 papers)
  9. Zilong Dong (34 papers)
  10. Liefeng Bo (83 papers)
  11. Qixing Huang (78 papers)
Citations (15)