Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects (2403.11510v1)

Published 18 Mar 2024 in cs.CV

Abstract: Despite the progress of learning-based methods for 6D object pose estimation, the trade-off between accuracy and scalability for novel objects still exists. Specifically, previous methods for novel objects do not make good use of the target object's 3D shape information since they focus on generalization by processing the shape indirectly, making them less effective. We present GenFlow, an approach that enables both accuracy and generalization to novel objects with the guidance of the target object's shape. Our method predicts optical flow between the rendered image and the observed image and refines the 6D pose iteratively. It boosts the performance by a constraint of the 3D shape and the generalizable geometric knowledge learned from an end-to-end differentiable system. We further improve our model by designing a cascade network architecture to exploit the multi-scale correlations and coarse-to-fine refinement. GenFlow ranked first on the unseen object pose estimation benchmarks in both the RGB and RGB-D cases. It also achieves performance competitive with existing state-of-the-art methods for the seen object pose estimation without any fine-tuning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Christopher M. Bishop. Pattern recognition and machine learning, 5th Edition. Springer, 2007.
  2. A framework for the robust estimation of optical flow. In Fourth International Conference on Computer Vision, ICCV 1993, Berlin, Germany, 11-14 May, 1993, Proceedings, pages 231–236. IEEE Computer Society, 1993.
  3. Shapenet: An information-rich 3d model repository. CoRR, abs/1512.03012, 2015.
  4. Epro-pnp: Generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2781–2790, 2022.
  5. Zeropose: Cad-model-based zero-shot pose estimation, 2023.
  6. Full flow: Optical flow estimation by global optimization over regular grids. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 4706–4714. IEEE Computer Society, 2016.
  7. Category level object pose estimation via neural analysis-by-synthesis. 2020.
  8. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  9. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pages 103–111. Association for Computational Linguistics, 2014.
  10. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pages 248–255. IEEE Computer Society, 2009.
  11. Blenderproc2: A procedural pipeline for photorealistic rendering. Journal of Open Source Software, 8(82):4901, 2023.
  12. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015.
  13. Recovering 6d object pose and predicting next-best-view in the crowd. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 3583–3592. IEEE Computer Society, 2016.
  14. Google scanned objects: A high-quality dataset of 3d scanned household items. In 2022 International Conference on Robotics and Automation, ICRA 2022, Philadelphia, PA, USA, May 23-27, 2022, pages 2553–2560. IEEE, 2022.
  15. Model globally, match locally: Efficient and robust 3d object recognition. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 998–1005. Ieee, 2010.
  16. Introducing mvtec ITODD - A dataset for 3d object recognition in industry. In 2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2017, Venice, Italy, October 22-29, 2017, pages 2200–2208. IEEE Computer Society, 2017.
  17. Dkm: Dense kernelized feature matching for geometry estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17765–17775, 2023.
  18. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381–395, 1981.
  19. YOLOX: exceeding YOLO series in 2021. CoRR, abs/2107.08430, 2021.
  20. Shape-constraint recurrent flow for 6d object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4831–4840, 2023.
  21. Surfemb: Dense and continuous correspondence distributions for object pose estimation with learnt surface embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6749–6758, 2022.
  22. Mask R-CNN. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 2980–2988. IEEE Computer Society, 2017.
  23. Onepose++: Keypoint-free one-shot object pose estimation without CAD models. In Advances in Neural Information Processing Systems, 2022a.
  24. Fs6d: Few-shot 6d pose estimation of novel objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6814–6824, 2022b.
  25. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Computer Vision - ACCV 2012 - 11th Asian Conference on Computer Vision, Daejeon, Korea, November 5-9, 2012, Revised Selected Papers, Part I, pages 548–562. Springer, 2012.
  26. T-LESS: an RGB-D dataset for 6d pose estimation of texture-less objects. In 2017 IEEE Winter Conference on Applications of Computer Vision, WACV 2017, Santa Rosa, CA, USA, March 24-31, 2017, pages 880–888. IEEE Computer Society, 2017.
  27. BOP: Benchmark for 6D object pose estimation. European Conference on Computer Vision (ECCV), 2018.
  28. BOP challenge 2020 on 6d object localization. In Computer Vision - ECCV 2020 Workshops - Glasgow, UK, August 23-28, 2020, Proceedings, Part II, pages 577–594. Springer, 2020.
  29. Berthold K. P. Horn and Brian G. Schunck. Determining optical flow. Artif. Intell., 17(1-3):185–203, 1981.
  30. Wide-depth-range 6d object pose estimation in space. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 15870–15879. Computer Vision Foundation / IEEE, 2021.
  31. Perspective flow aggregation for data-limited 6d object pose estimation. In ECCV, 2022.
  32. Liteflownet: A lightweight convolutional neural network for optical flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  33. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  34. Wolfgang Kabsch. A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography, 32(5):922–923, 1976.
  35. Homebreweddb: RGB-D dataset for 6d pose estimation of 3d objects. In 2019 IEEE/CVF International Conference on Computer Vision Workshops, ICCV Workshops 2019, Seoul, Korea (South), October 27-28, 2019, pages 2767–2776. IEEE, 2019.
  36. Segment anything. arXiv:2304.02643, 2023.
  37. MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare. In CoRL.
  38. Cosypose: Consistent multi-view multi-object 6d pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
  39. Epnp: An accurate o(n) solution to the pnp problem. International Journal Of Computer Vision, 81:155–166, 2009.
  40. Polarmesh: A star-convex 3d shape approximation for object pose estimation. IEEE Robotics Autom. Lett., 7(2):4416–4423, 2022.
  41. Deepim: Deep iterative matching for 6d pose estimation. In European Conference on Computer Vision (ECCV), 2018.
  42. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  43. Coupled iterative refinement for 6d multi-object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6728–6737, 2022.
  44. Gdrnpp. https://github.com/shanice-l/gdrnpp_bop2022, 2022a.
  45. Gen6d: Generalizable model-free 6-dof object pose estimation from rgb images. In ECCV, 2022b.
  46. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11976–11986, 2022c.
  47. David G. Lowe. Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision, Kerkyra, Corfu, Greece, September 20-25, 1999, pages 1150–1157. IEEE Computer Society, 1999.
  48. Cps++: Improving class-level 6d pose and shape estimation from monocular images with self-supervised learning, 2020.
  49. Templates for 3d object pose estimation revisited: Generalization to new objects and robustness to occlusions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6771–6780, 2022.
  50. Cnos: A strong baseline for cad-based novel object segmentation. arXiv preprint arXiv:2307.11067, 2023.
  51. Zephyr: Zero-shot pose hypothesis rating. In IEEE International Conference on Robotics and Automation, ICRA 2021, Xi’an, China, May 30 - June 5, 2021, pages 14141–14148. IEEE, 2021.
  52. Dinov2: Learning robust visual features without supervision, 2023.
  53. Latentfusion: End-to-end differentiable reconstruction and rendering for unseen object pose estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  54. Osop: A multi-stage one shot object pose estimation framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6835–6844, 2022.
  55. Disentangling monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  56. Zebrapose: Coarse to fine surface encoding for 6dof object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6738–6748, 2022.
  57. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  58. Onepose: One-shot object pose estimation without cad models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6825–6834, 2022.
  59. Multi-path learning for object pose estimation across domains. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  60. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, pages 6105–6114. PMLR, 2019.
  61. RAFT: recurrent all-pairs field transforms for optical flow. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part II, pages 402–419. Springer, 2020.
  62. Normalized object coordinate space for category-level 6d object pose and size estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  63. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. 2018.
  64. Pose from shape: Deep pose estimation for arbitrary 3d objects. In 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9-12, 2019, page 61. BMVA Press, 2019.
  65. Posecontrast: Class-agnostic object viewpoint estimation in the wild with pose-aware contrastive learning. In International Conference on 3D Vision (3DV), 2021.
  66. Volumetric correspondence networks for optical flow. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 793–803, 2019.
  67. TEASER: fast and certifiable point cloud registration. IEEE Trans. Robotics, 37(2):314–333, 2021.
  68. Generating uniform incremental grids on SO(3) using the hopf fibration. Int. J. Robotics Res., 29(7):801–812, 2010.
  69. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
Citations (4)

Summary

We haven't generated a summary for this paper yet.