Sketch2CAD: 3D CAD Model Reconstruction from 2D Sketch using Visual Transformer (2309.16850v2)
Abstract: Current 3D reconstruction methods typically generate outputs in the form of voxels, point clouds, or meshes. However, each of these formats has inherent limitations, such as rough surfaces and distorted structures. Additionally, these data types are not ideal for further manual editing and post-processing. In this paper, we present a novel 3D reconstruction method designed to overcome these disadvantages by reconstructing CAD-compatible models. We trained a visual transformer to predict a "scene descriptor" from a single 2D wire-frame image. This descriptor includes essential information, such as object types and parameters like position, rotation, and size. Using the predicted parameters, a 3D scene can be reconstructed with 3D modeling software that has programmable interfaces, such as Rhino Grasshopper, to build highly editable 3D models in the form of B-rep. To evaluate our proposed model, we created two datasets: one consisting of simple scenes and another with more complex scenes. The test results indicate the model's capability to accurately reconstruct simple scenes while highlighting its difficulties with more complex ones.
- J. Wang, J. Lin, Q. Yu, R. Liu, Y. Chen, and S. X. Yu, “3d shape reconstruction from free-hand sketches,” arXiv preprint arXiv:2006.09694, 2020.
- N. Bhardwaj, D. Bharadwaj, and A. Dubey, “Singlesketch2mesh: Generating 3d mesh model from sketch,” arXiv preprint arXiv:2203.03157, 2022.
- S.-H. Zhang, Y.-C. Guo, and Q.-W. Gu, “Sketch2model: View-aware 3d modeling from single free-hand sketches,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 6012–6021.
- A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su et al., “Shapenet: An information-rich 3d model repository,” arXiv preprint arXiv:1512.03012, 2015.
- X. Sun, J. Wu, X. Zhang, Z. Zhang, C. Zhang, T. Xue, J. B. Tenenbaum, and W. T. Freeman, “Pix3d: Dataset and methods for single-image 3d shape modeling,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- H. Kato, Y. Ushiku, and T. Harada, “Neural 3d mesh renderer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- S. Liu, T. Li, W. Chen, and H. Li, “Soft rasterizer: A differentiable renderer for image-based 3d reasoning,” The IEEE International Conference on Computer Vision (ICCV), Oct 2019.
- X.-F. Han, H. Laga, and M. Bennamoun, “Image-based 3d object reconstruction: State-of-the-art and trends in the deep learning era,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 5, pp. 1578–1604, 2019.
- C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese, “3d-r2n2: A unified approach for single and multi-view 3d object reconstruction,” in European conference on computer vision. Springer, 2016, pp. 628–644.
- V. A. Knyaz, V. V. Kniaz, and F. Remondino, “Image-to-voxel model translation with conditional adversarial networks,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0–0.
- N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang, “Pixel2mesh: Generating 3d mesh models from single rgb images,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 52–67.
- G. Ping, M. A. Esfahani, J. Chen, and H. Wang, “Visual enhancement of single-view 3d point cloud reconstruction,” Computers & Graphics, 2022.
- H. Fan, H. Su, and L. J. Guibas, “A point set generation network for 3d object reconstruction from a single image,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 605–613.
- M. Gadelha, R. Wang, and S. Maji, “Multiresolution tree networks for 3d point cloud processing,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 103–118.
- A. Thai, S. Stojanov, V. Upadhya, and J. M. Rehg, “3d reconstruction of novel object shapes from single images,” in 2021 International Conference on 3D Vision (3DV). IEEE, 2021, pp. 85–95.
- X. Zhang, Z. Zhang, C. Zhang, J. Tenenbaum, B. Freeman, and J. Wu, “Learning to reconstruct shapes from unseen classes,” Advances in neural information processing systems, vol. 31, 2018.
- C. Li, H. Pan, A. Bousseau, and N. J. Mitra, “Sketch2cad: Sequential cad modeling by sketching in context,” 2020.
- Y. He, W. Sun, H. Huang, J. Liu, H. Fan, and J. Sun, “Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- J. Tremblay, T. To, B. Sundaralingam, Y. Xiang, D. Fox, and S. Birchfield, “Deep object pose estimation for semantic robotic grasping of household objects,” arXiv preprint arXiv:1809.10790, 2018.
- H.-B. Yang, “Architectural sketch to 3d model,” Jun. 2023. [Online]. Available: https://doi.org/10.5281/zenodo.8002232
- T. Chen, S. Saxena, L. Li, D. J. Fleet, and G. Hinton, “Pix2seq: A language modeling framework for object detection,” arXiv preprint arXiv:2109.10852, 2021.
- T. Chen, S. Saxena, L. Li, T.-Y. Lin, D. J. Fleet, and G. Hinton, “A unified sequence interface for vision tasks,” arXiv preprint arXiv:2206.07669, 2022.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- M. M. Shariatnia, “Pix2Seq-pytorch,” 8 2022.
- H. Touvron, M. Cord, and H. Jégou, “Deit iii: Revenge of the vit,” 2022.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017.
- G. Orbay and L. B. Kara, “Pencil-like sketch rendering of 3d scenes using trajectory planning and dynamic tracking,” Journal of Visual Languages & Computing, vol. 25, no. 4, pp. 481–493, 2014.
- Y. Vinker, Y. Alaluf, D. Cohen-Or, and A. Shamir, “Clipascene: Scene sketching with different types and levels of abstraction,” 2023.