Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos (2403.15161v1)

Published 22 Mar 2024 in cs.CV

Abstract: Digitising the 3D world into a clean, CAD model-based representation has important applications for augmented reality and robotics. Current state-of-the-art methods are computationally intensive as they individually encode each detected object and optimise CAD alignments in a second stage. In this work, we propose FastCAD, a real-time method that simultaneously retrieves and aligns CAD models for all objects in a given scene. In contrast to previous works, we directly predict alignment parameters and shape embeddings. We achieve high-quality shape retrievals by learning CAD embeddings in a contrastive learning framework and distilling those into FastCAD. Our single-stage method accelerates the inference time by a factor of 50 compared to other methods operating on RGB-D scans while outperforming them on the challenging Scan2CAD alignment benchmark. Further, our approach collaborates seamlessly with online 3D reconstruction techniques. This enables the real-time generation of precise CAD model-based reconstructions from videos at 10 FPS. Doing so, we significantly improve the Scan2CAD alignment accuracy in the video setting from 43.0% to 48.2% and the reconstruction accuracy from 22.9% to 29.6%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Automatically annotating indoor images with cad models via rgb-d scans. In IEEE/CVF Winter Conf. App. Comput. Vis., 2023.
  2. Scan2cad: Learning cad model alignment in rgb-d scans. In IEEE Conf. Comput. Vis. Pattern Recog., 2019a.
  3. End-to-end cad model retrieval and 9dof alignment in 3d scans. In Int. Conf. Comput. Vis., 2019b.
  4. Scenecad: Predicting object alignments and layouts in rgb-d scans. In Eur. Conf. Comput. Vis., 2020.
  5. Transformerfusion: Monocular rgb scene reconstruction using transformers. Adv. Neural Inform. Process. Syst., 2021.
  6. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  7. Back-tracing representative points for voting-based 3d object detection in point clouds. In IEEE Conf. Comput. Vis. Pattern Recog., 2021.
  8. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In IEEE Conf. Comput. Vis. Pattern Recog., 2019.
  9. Deep global registration. In IEEE Conf. Comput. Vis. Pattern Recog., 2020.
  10. MMDetection3D Contributors. MMDetection3D: OpenMMLab next-generation platform for general 3D object detection, 2020.
  11. Joint embedding of 3d scan and cad objects. In Int. Conf. Comput. Vis., 2019.
  12. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In IEEE Conf. Comput. Vis. Pattern Recog., 2017.
  13. Mesh r-cnn. In Int. Conf. Comput. Vis., 2019.
  14. Generative sparse detection networks for 3d single-shot object detection. In Eur. Conf. Comput. Vis., 2020a.
  15. Generative sparse detection networks for 3d single-shot object detection. In Eur. Conf. Comput. Vis., 2020b.
  16. Monte carlo scene search for 3d scene understanding. In IEEE Conf. Comput. Vis. Pattern Recog., 2021.
  17. Perceiver: General perception with iterative attention. In Int. Conf. Mach. Learn., 2021.
  18. Dg-recon: Depth-guided neural 3d scene reconstruction. In Int. Conf. Comput. Vis., 2023.
  19. Mask2cad: 3d shape prediction by learning to segment and retrieve. In Eur. Conf. Comput. Vis., 2020.
  20. Patch2cad: Patchwise embedding learning for in-the-wild shape retrieval from a single image. Int. Conf. Comput. Vis., 2021.
  21. Leveraging geometry for shape estimation from a single rgb image. In Brit. Mach. Vis. Conf., 2021.
  22. Odam: Object detection, association, and mapping using posed rgb video. In Int. Conf. Comput. Vis., 2021.
  23. Joint embeddings of shapes and images via cnn image purification. ACM Trans. Graph., 2015.
  24. Group-free 3d object detection via transformers. In Int. Conf. Comput. Vis., 2021.
  25. Decoupled weight decay regularization. In Int. Conf. Learn. Represent., 2019.
  26. Vid2cad: Cad model alignment using multi-view constraints from videos. IEEE Transactions on Pattern Analysis and Machine Inttelligence, 2022.
  27. An End-to-End Transformer Model for 3D Object Detection. In Int. Conf. Comput. Vis., 2021.
  28. RGB-D Object-to-CAD Retrieval. In Eurographics Workshop on 3D Object Retrieval, 2018.
  29. Deep hough voting for 3d object detection in point clouds. In Int. Conf. Comput. Vis., 2019.
  30. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inform. Process. Syst., 2017.
  31. Fcaf3d: fully convolutional anchor-free 3d object detection. In Eur. Conf. Comput. Vis., 2022.
  32. Tr3d: Towards real-time indoor 3d object detection. In IEEE Int. Conf. Image Process., 2023.
  33. Simplerecon: 3d reconstruction without 3d convolutions. In Eur. Conf. Comput. Vis., 2022.
  34. Facenet: A unified embedding for face recognition and clustering. In IEEE Conf. Comput. Vis. Pattern Recog., 2015.
  35. Raytran: 3d pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers. In Eur. Conf. Comput. Vis., 2022.
  36. Rbgnet: Ray-based grouping for 3d object detection. In IEEE Conf. Comput. Vis. Pattern Recog., 2022.
  37. Reducing BERT pre-training time from 3 days to 76 minutes. Int. Conf. Learn. Represent., 2020.
  38. H3dnet: 3d object detection using hybrid geometric primitives. In Eur. Conf. Comput. Vis., 2020.
  39. Distance-iou loss: Faster and better learning for bounding box regression. In AAAI, 2020.

Summary

We haven't generated a summary for this paper yet.