Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Two-stage Synthetic Supervising and Multi-view Consistency Self-supervising based Animal 3D Reconstruction by Single Image (2311.13199v3)

Published 22 Nov 2023 in cs.CV

Abstract: Pixel-aligned Implicit Function (PIFu) effectively captures subtle variations in body shape within a low-dimensional space through extensive training with human 3D scans, its application to live animals presents formidable challenges due to the difficulty of obtaining animal cooperation for 3D scanning. To address this challenge, we propose the combination of two-stage supervised and self-supervised training to address the challenge of obtaining animal cooperation for 3D scanning. In the first stage, we leverage synthetic animal models for supervised learning. This allows the model to learn from a diverse set of virtual animal instances. In the second stage, we use 2D multi-view consistency as a self-supervised training method. This further enhances the model's ability to reconstruct accurate and realistic 3D shape and texture from largely available single-view images of real animals. The results of our study demonstrate that our approach outperforms state-of-the-art methods in both quantitative and qualitative aspects of bird 3D digitization. The source code is available at https://github.com/kuangzijian/drifu-for-animals.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. S. Saito, Z. Huang, R. Natsume, S. Morishima, A. Kanazawa, and H. Li, “Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization,” 2019.
  2. M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “SMPL: A skinned multi-person linear model,” pp. 248:1–248:16, Oct. 2015.
  3. A. Kanazawa, S. Tulsiani, A. A. Efros, and J. Malik, “Learning category-specific mesh reconstruction from image collections,” 2018.
  4. Li, Xueting, S. Liu, K. Kim, S. De Mello, V. Jampani, M.-H. Yang, and J. Kautz, “Self-supervised single-view 3d reconstruction via semantic consistency,” 2020.
  5. H. Fan, H. Su, and L. Guibas, “A point set generation network for 3d object reconstruction from a single image,” 2016.
  6. S. Liu, S. Saito, W. Chen, and H. Li, “Learning to infer implicit surfaces without 3d supervision,” 2019.
  7. L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, “Occupancy networks: Learning 3d reconstruction in function space,” 2019.
  8. H. Kato, Y. Ushiku, and T. Harada, “Neural 3d mesh renderer,” 2017.
  9. S. Liu, T. Li, W. Chen, and H. Li, “Soft rasterizer: A differentiable renderer for image-based 3d reasoning,” 2019.
  10. H. Kato and T. Harada, “Learning view priors for single-view 3d reconstruction,” 2019.
  11. N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang, “Pixel2mesh: Generating 3d mesh models from single rgb images,” 2018.
  12. J. Pan, X. Han, W. Chen, J. Tang, and K. Jia, “Deep mesh reconstruction from single rgb images via topology modification networks,” 2019.
  13. C. Wen, Y. Zhang, Z. Li, and Y. Fu, “Pixel2mesh++: Multi-view 3d mesh generation via deformation,” 2019.
  14. C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese, “3d-r2n2: A unified approach for single and multi-view 3d object reconstruction,” 2016.
  15. R. Girdhar, D. F. Fouhey, M. Rodriguez, and A. Gupta, “Learning a predictable and generative vector representation for objects,” 2016.
  16. J. Gwak, C. B. Choy, A. Garg, M. Chandraker, and S. Savarese, “Weakly supervised 3d reconstruction with adversarial constraint,” 2017.
  17. S. Tulsiani, T. Zhou, A. A. Efros, and J. Malik, “Multi-view supervision for single-view reconstruction via differentiable ray consistency,” 2017.
  18. O. Wiles and A. Zisserman, “Silnet : Single- and multi-view reconstruction by learning from silhouettes,” 2017.
  19. X. Yan, J. Yang, E. Yumer, Y. Guo, and H. Lee, “Perspective transformer nets: Learning single-view 3d object reconstruction without 3d supervision,” 2017.
  20. R. Zhu, H. K. Galoogahi, C. Wang, and S. Lucey, “Rethinking reprojection: Closing the loop for pose-aware shapereconstruction from a single image,” 2017.
  21. C. Häne, S. Tulsiani, and J. Malik, “Hierarchical surface prediction for 3d object reconstruction,” 2017.
  22. J. Wu, C. Zhang, T. Xue, W. T. Freeman, and J. B. Tenenbaum, “Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling,” 2017.
  23. J.-Y. Zhu, Z. Zhang, C. Zhang, J. Wu, A. Torralba, J. B. Tenenbaum, and W. T. Freeman, “Visual object networks: Image generation with disentangled 3d representation,” 2018.
  24. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” 2017.
  25. C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” 2017.
  26. Z. Kuang, L. Ying, X. Tie, and S. Jin, “Normalizing flow based defect detection with motion detection,” in International Conference on Smart Multimedia.   Springer, 2022, pp. 3–17.
  27. P. Henderson and V. Ferrari, “Learning to generate and reconstruct 3d meshes with only 2d supervision,” 2018.
  28. C. Sun, L. Bin Song, and L. Ying, “Product re-identification system in fully automated defect detection,” in International Conference on Smart Multimedia.   Springer, 2022, pp. 144–156.
  29. Y. Xiang, W. Kim, W. Chen, J. Ji, C. Choy, H. Su, R. Mottaghi, L. Guibas, and S. Savarese, “Objectnet3d: A large scale database for 3d object recognition,” 2016.
  30. A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “Shapenet: An information-rich 3d model repository,” 2015.
  31. W. Chen, J. Gao, H. Ling, E. J. Smith, J. Lehtinen, A. Jacobson, and S. Fidler, “Learning to predict 3d objects with an interpolation-based differentiable renderer,” 2019.
  32. Z. Kuang, X. Tie, X. Wu, and L. Ying, “Funet: Flow based conference video background subtraction,” in International Conference on Smart Multimedia.   Springer, 2022, pp. 18–28.
  33. A. Szabó and P. Favaro, “Unsupervised 3d shape learning from image collections in the wild,” 2018.
  34. S. Wu, C. Rupprecht, and A. Vedaldi, “Photo-geometric autoencoding to learn 3d objects from unlabelled images,” 2019.
  35. P. Henderson and V. Ferrari, “Learning single-image 3d reconstruction by generative modelling of shape, pose and shading,” 2019.
  36. S. B. Wah, Catherine, “The caltech-ucsd birds-200-2011 dataset,” 2011.
  37. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” 2009.
  38. N. Ravi, J. Reizenstein, D. Novotny, T. Gordon, W.-Y. Lo, J. Johnson, and G. Gkioxari, “Accelerating 3d deep learning with pytorch3d,” 2020.
  39. E. Borenstein, “Weizmann horse database,” 2011.

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com