Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Geometry aware 3D generation from in-the-wild images in ImageNet (2402.00225v2)

Published 31 Jan 2024 in cs.CV

Abstract: Generating accurate 3D models is a challenging problem that traditionally requires explicit learning from 3D datasets using supervised learning. Although recent advances have shown promise in learning 3D models from 2D images, these methods often rely on well-structured datasets with multi-view images of each instance or camera pose information. Furthermore, these datasets usually contain clean backgrounds with simple shapes, making them expensive to acquire and hard to generalize, which limits the applicability of these methods. To overcome these limitations, we propose a method for reconstructing 3D geometry from the diverse and unstructured Imagenet dataset without camera pose information. We use an efficient triplane representation to learn 3D models from 2D images and modify the architecture of the generator backbone based on StyleGAN2 to adapt to the highly diverse dataset. To prevent mode collapse and improve the training stability on diverse data, we propose to use multi-view discrimination. The trained generator can produce class-conditional 3D models as well as renderings from arbitrary viewpoints. The class-conditional generation results demonstrate significant improvement over the current state-of-the-art method. Additionally, using PTI, we can efficiently reconstruct the whole 3D geometry from single-view images.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Sal: Sign agnostic learning of shapes from raw data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2565–2574, 2020.
  2. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  3. Demystifying mmd gans. arXiv preprint arXiv:1801.01401, 2018.
  4. 3d reconstruction of “in-the-wild” faces in images and videos. IEEE transactions on pattern analysis and machine intelligence, 40(11):2638–2652, 2018.
  5. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
  6. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022.
  7. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5799–5809, 2021.
  8. ShapeNet: An Information-Rich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago, 2015.
  9. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  10. Unconstrained scene generation with locally conditioned radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14304–14313, 2021.
  11. Large scale adversarial representation learning. Advances in neural information processing systems, 32, 2019.
  12. Neural scene representation and rendering. Science, 360(6394):1204–1210, 2018.
  13. 3d shape induction from 2d views of multiple objects. In 2017 International Conference on 3D Vision (3DV), pages 402–411. IEEE, 2017.
  14. When, why, and which pretrained gans are useful? arXiv preprint arXiv:2202.08937, 2022.
  15. Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099, 2020.
  16. Codenerf: Disentangled neural radiance fields for object categories. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12949–12958, 2021.
  17. Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems, 33:12104–12114, 2020.
  18. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34:852–863, 2021.
  19. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  20. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020.
  21. Learning to infer implicit surfaces without 3d supervision. Advances in Neural Information Processing Systems, 32, 2019.
  22. Dist: Rendering deep implicit signed distance function with differentiable sphere tracing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2019–2028, 2020.
  23. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7210–7219, 2021.
  24. Realfusion: 360° reconstruction of any object from a single image. arXiv e-prints, pages arXiv–2302, 2023.
  25. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  26. Giraffe: Representing scenes as compositional generative neural feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11453–11464, 2021.
  27. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3504–3515, 2020.
  28. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
  29. Convolutional occupancy networks. In European Conference on Computer Vision, pages 523–540. Springer, 2020.
  30. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  31. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  32. Vision transformers for dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12179–12188, 2021.
  33. Pivotal tuning for latent-based editing of real images. ACM Transactions on Graphics (TOG), 42(1):1–13, 2022.
  34. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  35. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
  36. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2304–2314, 2019.
  37. Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
  38. Stylegan-xl: Scaling stylegan to large diverse datasets. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022.
  39. Graf: Generative radiance fields for 3d-aware image synthesis. Advances in Neural Information Processing Systems, 33:20154–20166, 2020.
  40. Lifting 2d stylegan for 3d-aware face generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6258–6266, 2021.
  41. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  42. Metasdf: Meta-learning signed distance functions. Advances in Neural Information Processing Systems, 33:10136–10147, 2020.
  43. Scene representation networks: Continuous 3d-structure-aware neural scene representations. Advances in Neural Information Processing Systems, 32, 2019.
  44. Unsupervised generative 3d shape learning from natural images. arXiv preprint arXiv:1910.00287, 2019.
  45. Multi-view consistency as supervisory signal for learning shape and pose prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2897–2905, 2018.
  46. Multi-view supervision for single-view reconstruction via differentiable ray consistency. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2626–2634, 2017.
  47. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8798–8807, 2018.
  48. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Advances in neural information processing systems, 29, 2016.
  49. Unsupervised learning of probably symmetric deformable 3d objects from images in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1–10, 2020.
  50. Any-shot gin: Generalizing implicit networks for reconstructing novel classes. In International Conference on 3D Vision (3DV), volume 3, 2022.
  51. A large-scale car dataset for fine-grained categorization and verification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3973–3981, 2015.
  52. Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems, 33:2492–2502, 2020.
  53. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  54. Learning to reconstruct shapes from unseen classes. Advances in neural information processing systems, 31, 2018.
  55. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017.
  56. Visual object networks: Image generation with disentangled 3d representations. Advances in neural information processing systems, 31, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Qijia Shen (1 paper)
  2. Guangrun Wang (43 papers)

Summary

We haven't generated a summary for this paper yet.