FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos (2403.15161v1)
Abstract: Digitising the 3D world into a clean, CAD model-based representation has important applications for augmented reality and robotics. Current state-of-the-art methods are computationally intensive as they individually encode each detected object and optimise CAD alignments in a second stage. In this work, we propose FastCAD, a real-time method that simultaneously retrieves and aligns CAD models for all objects in a given scene. In contrast to previous works, we directly predict alignment parameters and shape embeddings. We achieve high-quality shape retrievals by learning CAD embeddings in a contrastive learning framework and distilling those into FastCAD. Our single-stage method accelerates the inference time by a factor of 50 compared to other methods operating on RGB-D scans while outperforming them on the challenging Scan2CAD alignment benchmark. Further, our approach collaborates seamlessly with online 3D reconstruction techniques. This enables the real-time generation of precise CAD model-based reconstructions from videos at 10 FPS. Doing so, we significantly improve the Scan2CAD alignment accuracy in the video setting from 43.0% to 48.2% and the reconstruction accuracy from 22.9% to 29.6%.
- Automatically annotating indoor images with cad models via rgb-d scans. In IEEE/CVF Winter Conf. App. Comput. Vis., 2023.
- Scan2cad: Learning cad model alignment in rgb-d scans. In IEEE Conf. Comput. Vis. Pattern Recog., 2019a.
- End-to-end cad model retrieval and 9dof alignment in 3d scans. In Int. Conf. Comput. Vis., 2019b.
- Scenecad: Predicting object alignments and layouts in rgb-d scans. In Eur. Conf. Comput. Vis., 2020.
- Transformerfusion: Monocular rgb scene reconstruction using transformers. Adv. Neural Inform. Process. Syst., 2021.
- Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
- Back-tracing representative points for voting-based 3d object detection in point clouds. In IEEE Conf. Comput. Vis. Pattern Recog., 2021.
- 4d spatio-temporal convnets: Minkowski convolutional neural networks. In IEEE Conf. Comput. Vis. Pattern Recog., 2019.
- Deep global registration. In IEEE Conf. Comput. Vis. Pattern Recog., 2020.
- MMDetection3D Contributors. MMDetection3D: OpenMMLab next-generation platform for general 3D object detection, 2020.
- Joint embedding of 3d scan and cad objects. In Int. Conf. Comput. Vis., 2019.
- Scannet: Richly-annotated 3d reconstructions of indoor scenes. In IEEE Conf. Comput. Vis. Pattern Recog., 2017.
- Mesh r-cnn. In Int. Conf. Comput. Vis., 2019.
- Generative sparse detection networks for 3d single-shot object detection. In Eur. Conf. Comput. Vis., 2020a.
- Generative sparse detection networks for 3d single-shot object detection. In Eur. Conf. Comput. Vis., 2020b.
- Monte carlo scene search for 3d scene understanding. In IEEE Conf. Comput. Vis. Pattern Recog., 2021.
- Perceiver: General perception with iterative attention. In Int. Conf. Mach. Learn., 2021.
- Dg-recon: Depth-guided neural 3d scene reconstruction. In Int. Conf. Comput. Vis., 2023.
- Mask2cad: 3d shape prediction by learning to segment and retrieve. In Eur. Conf. Comput. Vis., 2020.
- Patch2cad: Patchwise embedding learning for in-the-wild shape retrieval from a single image. Int. Conf. Comput. Vis., 2021.
- Leveraging geometry for shape estimation from a single rgb image. In Brit. Mach. Vis. Conf., 2021.
- Odam: Object detection, association, and mapping using posed rgb video. In Int. Conf. Comput. Vis., 2021.
- Joint embeddings of shapes and images via cnn image purification. ACM Trans. Graph., 2015.
- Group-free 3d object detection via transformers. In Int. Conf. Comput. Vis., 2021.
- Decoupled weight decay regularization. In Int. Conf. Learn. Represent., 2019.
- Vid2cad: Cad model alignment using multi-view constraints from videos. IEEE Transactions on Pattern Analysis and Machine Inttelligence, 2022.
- An End-to-End Transformer Model for 3D Object Detection. In Int. Conf. Comput. Vis., 2021.
- RGB-D Object-to-CAD Retrieval. In Eurographics Workshop on 3D Object Retrieval, 2018.
- Deep hough voting for 3d object detection in point clouds. In Int. Conf. Comput. Vis., 2019.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inform. Process. Syst., 2017.
- Fcaf3d: fully convolutional anchor-free 3d object detection. In Eur. Conf. Comput. Vis., 2022.
- Tr3d: Towards real-time indoor 3d object detection. In IEEE Int. Conf. Image Process., 2023.
- Simplerecon: 3d reconstruction without 3d convolutions. In Eur. Conf. Comput. Vis., 2022.
- Facenet: A unified embedding for face recognition and clustering. In IEEE Conf. Comput. Vis. Pattern Recog., 2015.
- Raytran: 3d pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers. In Eur. Conf. Comput. Vis., 2022.
- Rbgnet: Ray-based grouping for 3d object detection. In IEEE Conf. Comput. Vis. Pattern Recog., 2022.
- Reducing BERT pre-training time from 3 days to 76 minutes. Int. Conf. Learn. Represent., 2020.
- H3dnet: 3d object detection using hybrid geometric primitives. In Eur. Conf. Comput. Vis., 2020.
- Distance-iou loss: Faster and better learning for bounding box regression. In AAAI, 2020.