Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation (2403.08019v3)

Published 12 Mar 2024 in cs.CV

Abstract: We propose a single-shot approach to determining 6-DoF pose of an object with available 3D computer-aided design (CAD) model from a single RGB image. Our method, dubbed MRC-Net, comprises two stages. The first performs pose classification and renders the 3D object in the classified pose. The second stage performs regression to predict fine-grained residual pose within class. Connecting the two stages is a novel multi-scale residual correlation (MRC) layer that captures high-and-low level correspondences between the input image and rendering from first stage. MRC-Net employs a Siamese network with shared weights between both stages to learn embeddings for input and rendered images. To mitigate ambiguity when predicting discrete pose class labels on symmetric objects, we use soft probabilistic labels to define pose class in the first stage. We demonstrate state-of-the-art accuracy, outperforming all competing RGB-based methods on four challenging BOP benchmark datasets: T-LESS, LM-O, YCB-V, and ITODD. Our method is non-iterative and requires no complex post-processing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Learning 6d object pose estimation using 3d object coordinates. In European Conference on Computer Vision, pages 536–551. Springer, 2014.
  2. SC6D: Symmetry-agnostic and correspondence-free 6d object pose estimation. In International Conference on 3D Vision, pages 536–546. IEEE, 2022.
  3. CRT-6D: Fast 6d object pose estimation with cascaded refinement transformers. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pages 5746–5755, 2023.
  4. EPro-PnP: Generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2781–2790, 2022.
  5. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848, 2017.
  6. Efficient multi-view object recognition and full pose estimation. In IEEE International Conference on Robotics and Automation, pages 2050–2055. IEEE, 2010.
  7. SO-Pose: Exploiting self-occlusion for direct 6d pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, pages 12396–12405, 2021.
  8. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2758–2766, 2015.
  9. Introducing MVTec ITODD-a dataset for 3d object recognition in industry. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 2200–2208, 2017.
  10. Knowledge distillation for 6d pose estimation by aligning distributions of local predictions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 18633–18642, 2023.
  11. Shape-constraint recurrent flow for 6d object pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4831–4840, 2023a.
  12. Rigidity-aware detection for 6d object pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8927–8936, 2023b.
  13. Surfemb: Dense and continuous correspondence distributions for object pose estimation with learnt surface embeddings. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6749–6758, 2022.
  14. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, pages 2961–2969, 2017.
  15. Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In Proceedings of the IEEE International Conference on Computer Vision, pages 858–865. IEEE, 2011.
  16. T-LESS: An RGB-D dataset for 6d pose estimation of texture-less objects. In IEEE Winter Conference on Applications of Computer Vision, pages 880–888. IEEE, 2017.
  17. EPOS: Estimating 6d pose of objects with symmetries. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11703–11712, 2020.
  18. BOP challenge 2020 on 6d object localization. In European Conference on Computer Vision, pages 577–594. Springer, 2020.
  19. Segmentation-driven 6d object pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3385–3394, 2019.
  20. Single-stage 6d object pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2930–2939, 2020.
  21. Wide-depth-range 6d object pose estimation in space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 15870–15879, 2021.
  22. Perspective flow aggregation for data-limited 6d object pose estimation. In European Conference on Computer Vision, pages 89–106. Springer, 2022.
  23. Neural correspondence field for object pose estimation. In European Conference on Computer Vision, pages 585–603. Springer, 2022.
  24. RePOSE: Fast 6d object pose refinement via deep texture rendering. In Proceedings of the IEEE International Conference on Computer Vision, pages 3303–3312, 2021.
  25. SSD-6D: Making rgb-based 3d detection and 6d pose estimation great again. In Proceedings of the IEEE International Conference on Computer Vision, pages 1521–1529, 2017.
  26. CosyPose: Consistent multi-view multi-object 6d pose estimation. In European Conference on Computer Vision, pages 574–591. Springer, 2020.
  27. MegaPose: 6d pose estimation of novel objects via render & compare. In Proceedings of the 6th Conference on Robot Learning, 2022.
  28. DeepIM: Deep iterative matching for 6d pose estimation. In Proceedings of the European Conference on Computer Vision, pages 683–698, 2018.
  29. CDPN: coordinates-based disentangled pose network for real-time rgb-based 6-DoF object pose estimation. In IEEE International Conference on Computer Vision, pages 7677–7686, 2019.
  30. CLIFF: Carrying location information in full frames into human pose and shape estimation. In European Conference on Computer Vision, pages 590–606. Springer, 2022.
  31. CheckerPose: Progressive dense keypoint localization for object pose estimation with graph neural network. In Proceedings of the IEEE International Conference on Computer Vision, pages 14022–14033, 2023.
  32. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, pages 2980–2988, 2017.
  33. Coupled iterative refinement for 6d multi-object pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6728–6737, 2022.
  34. Linear-covariance loss for end-to-end learning of 6d pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, pages 14107–14117, 2023.
  35. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
  36. Deep model-based 6d pose refinement in rgb. In Proceedings of the European Conference on Computer Vision, pages 800–815, 2018.
  37. 3D bounding box estimation using deep learning and geometry. In Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pages 7074–7082, 2017.
  38. Numerical optimization. Springer, 1999.
  39. DProST: Dynamic projective spatial transformer network for 6d pose estimation. In European Conference on Computer Vision, pages 363–379. Springer, 2022.
  40. PVNet: Pixel-wise voting network for 6dof pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4561–4570, 2019.
  41. BB8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In Proceedings of the IEEE International Conference on Computer Vision, pages 3828–3836, 2017.
  42. Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501, 2020.
  43. Underwater marker-based pose-estimation with associated uncertainty. In Proceedings of the IEEE International Conference on Computer Vision, pages 3713–3721, 2021.
  44. DPODv2: Dense correspondence-based 6 dof pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
  45. Disentangling monocular 3d object detection. In Proceedings of the IEEE International Conference on Computer Vision, pages 1991–1999, 2019.
  46. HybridPose: 6d object pose estimation under hybrid representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 431–440, 2020.
  47. ZebraPose: Coarse to fine surface encoding for 6dof object pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6738–6748, 2022.
  48. PWC-Net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8934–8943, 2018.
  49. Implicit 3d orientation learning for 6d object detection from rgb images. In European Conference on Computer Vision, pages 699–715, 2018.
  50. Multi-path learning for object pose estimation across domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 13916–13925, 2020.
  51. RAFT: Recurrent all-pairs field transforms for optical flow. In European Conference on Computer Vision, pages 402–419. Springer, 2020.
  52. GDR-Net: Geometry-guided direct regression network for monocular 6d object pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 16611–16621, 2021.
  53. PoseCNN: A convolutional neural network for 6d object pose estimation in cluttered scenes. Robotics: Science and Systems, 2018.
  54. RNNPose: Recurrent 6-dof object pose refinement with robust correspondence field estimation and pose optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 14880–14890, 2022.
  55. Generating uniform incremental grids on SO(3) using the Hopf fibration. The International Journal of Robotics Research, 29(7):801–812, 2010.
  56. Learning symmetry-aware geometry correspondences for 6d object pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, pages 14045–14054, 2023.
  57. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yuelong Li (11 papers)
  2. Yafei Mao (3 papers)
  3. Raja Bala (9 papers)
  4. Sunil Hadap (12 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com