Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models (2312.00947v3)

Published 1 Dec 2023 in cs.CV

Abstract: Estimating the 6D pose of objects unseen during training is highly desirable yet challenging. Zero-shot object 6D pose estimation methods address this challenge by leveraging additional task-specific supervision provided by large-scale, photo-realistic synthetic datasets. However, their performance heavily depends on the quality and diversity of rendered data and they require extensive training. In this work, we show how to tackle the same task but without training on specific data. We propose FreeZe, a novel solution that harnesses the capabilities of pre-trained geometric and vision foundation models. FreeZe leverages 3D geometric descriptors learned from unrelated 3D point clouds and 2D visual features learned from web-scale 2D images to generate discriminative 3D point-level descriptors. We then estimate the 6D pose of unseen objects by 3D registration based on RANSAC. We also introduce a novel algorithm to solve ambiguous cases due to geometrically symmetric objects that is based on visual features. We comprehensively evaluate FreeZe across the seven core datasets of the BOP Benchmark, which include over a hundred 3D objects and 20,000 images captured in various scenarios. FreeZe consistently outperforms all state-of-the-art approaches, including competitors extensively trained on synthetic 6D pose estimation data. Code will be publicly available at https://andreacaraffa.github.io/freeze.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Spinnet: Learning a general surface descriptor for 3d point cloud registration. In CVPR, 2021.
  2. Zs6d: Zero-shot 6d object pose estimation using vision transformers. arXiv:2309.11986, 2023.
  3. Learning 6D object pose estimation using 3D object coordinates. In ECCV, 2014.
  4. External camera-based mobile robot pose estimation for collaborative perception with smart edge sensors. arXiv:2303.03797, 2023.
  5. Posematcher: One-shot 6d object pose estimation by deep feature matching. arXiv:2304.01382, 2023.
  6. ZeroPose: CAD-model-based zero-shot pose estimation. arXiv:2305.17934, 2023a.
  7. Stereopose: Category-level 6d transparent object pose estimation from stereo images via back-view nocs. In ICRA, 2023b.
  8. G2l-net: Global to local network for real-time 6d pose estimation with embedding vector features. In CVPR, 2020.
  9. Fully convolutional geometric features. In ICCV, 2019.
  10. Revisiting Fully Convolutional Geometric Features for object 6D pose estimation. In ICCV-W, 2023.
  11. Recovering 6D object pose and predicting next-best-view in the crowd. In CVPR, 2016.
  12. Introducing MVTec ITODD – A dataset for 3D object recognition in industry. In ICCV-W, 2017.
  13. Imagebind: One embedding space to bind them all. In CVPR, 2023.
  14. You only look at one: Category-level object representations for pose estimation from a single example. In CoRL, 2023.
  15. Ffb6d: A full flow bidirectional fusion network for 6d pose estimation. In CVPR, 2021.
  16. T-LESS: An RGB-D dataset for 6D pose estimation of texture-less objects. In WACV, 2017.
  17. BOP: Benchmark for 6D object pose estimation. In ECCV, 2018.
  18. Deep learning for 6d pose estimation of objects—a case study for autonomous driving. Expert Systems with Applications, 2023.
  19. Predator: Registration of 3d point clouds with low overlap. In CVPR, 2021.
  20. HomebrewedDB: RGB-D dataset for 6D pose estimation of 3D objects. In ICCV-W, 2019.
  21. Segment anything. arXiv:2304.02643, 2023.
  22. Megapose: 6d pose estimation of novel objects via render & compare. arXiv:2212.06870, 2022.
  23. End-to-end learning local multi-view descriptors for 3D point clouds. In CVPR, 2020.
  24. Wsdesc: Weakly supervised 3d local descriptor learning for point cloud registration. IEEE Transactions on Visualization and Computer Graphics, 2022.
  25. Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation. In ICCV, 2019.
  26. Robotic continuous grasping system by shape transformer-guided multi-object category-level 6d pose estimation. IEEE Transactions on Industrial Informatics, 2023.
  27. CNOS: A strong baseline for CAD-based novel object segmentation. In ICCV-W, 2023.
  28. Zephyr: Zero-shot pose hypothesis rating. In ICRA, 2021.
  29. Dinov2: Learning robust visual features without supervision. arXiv:2304.07193, 2023.
  30. Openscene: 3d scene understanding with open vocabularies. In CVPR, 2023.
  31. Learning general and distinctive 3D local deep descriptors for point cloud registration. IEEE TPAMI, 2023.
  32. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. NeurIPS, 2017.
  33. Learning transferable visual models from natural language supervision. In International conference on machine learning, 2021.
  34. Fast point feature histograms (FPFH) for 3D registration. In ICRA, 2009.
  35. OSOP: A Multi-Stage One Shot Object Pose Estimation Framework. In CVPR, 2022.
  36. Deep multi-state object pose estimation for augmented reality assembly. In ISMAR-Adjunct, 2019.
  37. Zebrapose: Coarse to fine surface encoding for 6dof object pose estimation. In CVPR, 2022.
  38. Onepose: One-shot object pose estimation without cad models. In CVPR, 2022.
  39. Bop challenge 2022 on detection, segmentation and pose estimation of specific rigid objects. In CVPR, 2023.
  40. Challenges for monocular 6d object pose estimation in robotics. arXiv:2307.12172, 2023a.
  41. Open challenges for monocular single-shot 6d object pose estimation. arXiv:2302.11827, 2023b.
  42. L.J.P. van der Maaten and G.E. Hinton. Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research, 2008.
  43. Densefusion: 6d object pose estimation by iterative dense fusion. In CVPR, 2019.
  44. GDR-Net: Geometry-guided direct regression network for monocular 6d object pose estimation. In CVPR, 2021.
  45. Query6dof: Learning sparse queries as implicit shape prior for category-level 6dof pose estimation. In ICCV, 2023.
  46. PoseCNN: A Convolutional Neural Network for 6D object pose estimation in cluttered scenes. In Robotic Science and Systems, 2018.
  47. TOLDI: An effective and robust approach for 3D local shape description. Patt. Recogn., 2017.
  48. 6d pose estimation for textureless objects on rgb frames using multi-view optimization. In ICRA, 2023.
  49. Category-level 6d object pose estimation in the wild: A semi-supervised learning approach and a new dataset. NeurIPS, 2022.
  50. G3doa: Generalizable 3d descriptor with overlap attention for point cloud registration. RA-L, 2022.
  51. Fast segment anything. arXiv:2306.12156, 2023.
  52. 3D point capsule networks. In CVPR, 2019.
  53. PointCLIPv2: Prompting CLIP and GPT for Powerful 3D Open-world Learning. In ICCV, 2023.
  54. Instance segmentation based 6d pose estimation of industrial objects using point clouds for robotic bin-picking. Robotics and Computer-Integrated Manufacturing, 2023.
  55. Learning geometric consistency and discrepancy for category-level 6d object pose estimation from point clouds. Pattern Recognition, 2024.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

Youtube Logo Streamline Icon: https://streamlinehq.com