Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning (2312.13980v2)

Published 21 Dec 2023 in cs.CV and cs.LG

Abstract: Multi-view diffusion models, obtained by applying Supervised Finetuning (SFT) to text-to-image diffusion models, have driven recent breakthroughs in text-to-3D research. However, due to the limited size and quality of existing 3D datasets, they still suffer from multi-view inconsistencies and Neural Radiance Field (NeRF) reconstruction artifacts. We argue that multi-view diffusion models can benefit from further Reinforcement Learning Finetuning (RLFT), which allows models to learn from the data generated by themselves and improve beyond their dataset limitations during SFT. To this end, we introduce Carve3D, an improved RLFT algorithm coupled with a novel Multi-view Reconstruction Consistency (MRC) metric, to enhance the consistency of multi-view diffusion models. To measure the MRC metric on a set of multi-view images, we compare them with their corresponding NeRF renderings at the same camera viewpoints. The resulting model, which we denote as Carve3DM, demonstrates superior multi-view consistency and NeRF reconstruction quality than existing models. Our results suggest that pairing SFT with Carve3D's RLFT is essential for developing multi-view-consistent diffusion models, mirroring the standard LLM alignment pipeline. Our code, training and testing data, and video results are available at: https://desaixie.github.io/carve-3d.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Adobe. Adobe firefly. https://www.adobe.com/sensei/generative-ai/firefly.html, 2023. Accessed: 2023-11-15.
  2. Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022a.
  3. Constitutional ai: Harmlessness from ai feedback, 2022b.
  4. Kevin Black. ddpo-pytorch. https://github.com/kvablack/ddpo-pytorch, 2023. Accessed: 2023-11-17.
  5. Training diffusion models with reinforcement learning, 2023.
  6. Language models are few-shot learners, 2020.
  7. Open problems and fundamental limitations of reinforcement learning from human feedback, 2023.
  8. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision (ECCV), 2022.
  9. Text-to-3d using gaussian splatting, 2023.
  10. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  11. Directly fine-tuning diffusion models on differentiable rewards, 2023.
  12. Objaverse: A universe of annotated 3d objects, 2022.
  13. Objaverse-xl: A universe of 10m+ 3d objects, 2023.
  14. Hyperparameters in reinforcement learning and how to tune them, 2023.
  15. Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models, 2023.
  16. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  17. Lrm: Large reconstruction model for single image to 3d, 2023.
  18. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  19. Shap-e: Generating conditional 3d implicit functions, 2023.
  20. Scaling laws for neural language models, 2020.
  21. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG), 42(4):1–14, 2023.
  22. Rlaif: Scaling reinforcement learning from human feedback with ai feedback, 2023a.
  23. Aligning text-to-image models using human feedback, 2023b.
  24. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model, 2023a.
  25. Remax: A simple, effective, and efficient reinforcement learning method for aligning large language models, 2023b.
  26. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization, 2023a.
  27. Zero-1-to-3: Zero-shot one image to 3d object, 2023b.
  28. Syncdreamer: Learning to generate multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023c.
  29. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  30. Lcm-lora: A universal stable-diffusion acceleration module, 2023.
  31. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  32. Asynchronous methods for deep reinforcement learning, 2016.
  33. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
  34. Point-e: A system for generating 3d point clouds from complex prompts, 2022.
  35. OpenAI. Chatgpt. https://chat.openai.com/, 2023. Accessed: 2023-11-15.
  36. Training language models to follow instructions with human feedback, 2022.
  37. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  38. Tuning computer vision models with task rewards, 2023.
  39. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
  40. Dreamfusion: Text-to-3d using 2d diffusion, 2022.
  41. Aligning text-to-image diffusion models with reward backpropagation, 2023.
  42. High-resolution image synthesis with latent diffusion models, 2022.
  43. Laion-5b: An open large-scale dataset for training next generation image-text models, 2022.
  44. John Schulman. Rl and truthfulness: Towards truthgpt. YouTube, 2023.
  45. Proximal policy optimization algorithms, 2017.
  46. Mvdream: Multi-view diffusion for 3d generation. arXiv:2308.16512, 2023.
  47. Denoising diffusion implicit models, 2022.
  48. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023.
  49. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022.
  50. Trl: Transformer reinforcement learning. https://github.com/huggingface/trl, 2020.
  51. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation, 2022.
  52. Novel view synthesis with diffusion models, 2022.
  53. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
  54. Deepspeed-chat: Easy, fast and affordable rlhf training of chatgpt-like models at all scales, 2023.
  55. Gaussiandreamer: Fast generation from text to 3d gaussian splatting with point cloud priors. arxiv:2310.08529, 2023.
  56. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  57. Hive: Harnessing human feedback for instructional visual editing. 2023.
  58. Efficientdreamer: High-fidelity and robust 3d creation via orthogonal-view diffusion prior, 2023.
  59. Secrets of rlhf in large language models part i: Ppo, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Desai Xie (4 papers)
  2. Jiahao Li (80 papers)
  3. Hao Tan (80 papers)
  4. Xin Sun (151 papers)
  5. Zhixin Shu (37 papers)
  6. Yi Zhou (438 papers)
  7. Sai Bi (44 papers)
  8. Sören Pirk (25 papers)
  9. Arie E. Kaufman (6 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets