Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeepRM: Deep Recurrent Matching for 6D Pose Refinement (2205.14474v5)

Published 28 May 2022 in cs.CV

Abstract: Precise 6D pose estimation of rigid objects from RGB images is a critical but challenging task in robotics, augmented reality and human-computer interaction. To address this problem, we propose DeepRM, a novel recurrent network architecture for 6D pose refinement. DeepRM leverages initial coarse pose estimates to render synthetic images of target objects. The rendered images are then matched with the observed images to predict a rigid transform for updating the previous pose estimate. This process is repeated to incrementally refine the estimate at each iteration. The DeepRM architecture incorporates LSTM units to propagate information through each refinement step, significantly improving overall performance. In contrast to current 2-stage Perspective-n-Point based solutions, DeepRM is trained end-to-end, and uses a scalable backbone that can be tuned via a single parameter for accuracy and efficiency. During training, a multi-scale optical flow head is added to predict the optical flow between the observed and synthetic images. Optical flow prediction stabilizes the training process, and enforces the learning of features that are relevant to the task of pose estimation. Our results demonstrate that DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. A survey of augmented reality. Foundations and Trends in Human-Computer Interaction, 8(2-3):73–272, 2014.
  2. Learning 6D object pose estimation using 3D object coordinates. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), volume 8690 LNCS, pages 536–551, 2014.
  3. Pose Proposal Critic: Robust Pose Refinement by Learning Reprojection Errors. In Proceedings of the British Machine Vision Conference, pages 1–16, 2020.
  4. Crt-6d: Fast 6d object pose estimation with cascaded refinement transformers. 10 2023.
  5. MOPED: Object Recognition and Pose Estimation for Manipulation. The International Journal of Robotics Research, 30:1284–1306, 2011.
  6. R³Net: Recurrent Residual Refinement Network for Saliency Detection. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 684–690, 2018.
  7. SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation. In Proceedings of the IEEE International Conference on Computer Vision, volume 1, 2021.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929, 2020.
  9. FlowNet: Learning optical flow with convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, 2015 Inter:2758–2766, 2015.
  10. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
  11. Gradient response maps for real-time detection of textureless objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(5):876–888, 2012.
  12. Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7724 LNCS(PART 1):548–562, 2013.
  13. EPOS: Estimating 6D pose of objects with symmetries. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 11700–11709, 2020.
  14. BOP Challenge 2020 on 6D Object Localization. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12536 LNCS:577–594, 2020.
  15. RePOSE: Real-Time Iterative Rendering and Refinement for 6D Object Pose Estimation. In IEEE/CVF International Conference on Computer Vision, 2021.
  16. Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pages 1–15, 2015.
  17. CosyPose : Consistent Multi-view Multi-object 6D Pose Estimation. In European Conference on Computer Vision, volume 2, pages 574–591, 2020.
  18. DeepIM: Deep Iterative Matching for 6D Pose Estimation. International Journal of Computer Vision, 128(3):657–678, oct 2020.
  19. Coupled Iterative Refinement for 6D Multi-Object Pose Estimation. 2022.
  20. Swin transformer: Hierarchical vision transformer using shifted windows. CoRR, abs/2103.14030, 2021.
  21. Making deep heatmaps robust to partial occlusions for 3D object pose estimation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11219 LNCS:125–141, 2018.
  22. PVNET: Pixel-wise voting network for 6dof pose estimation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June:4556–4565, dec 2019.
  23. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth. In Proceedings of the IEEE International Conference on Computer Vision, volume 2017-Octob, pages 3848–3856, 2017.
  24. End-to-end instance segmentation with recurrent attention. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017-Janua:293–301, 2017.
  25. Disentangling monocular 3D object detection. Proceedings of the IEEE International Conference on Computer Vision, 2019-Octob:1991–1999, 2019.
  26. Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pages 1–14, 2015.
  27. ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation. 2022.
  28. EfficientNet: Rethinking model scaling for convolutional neural networks. 36th International Conference on Machine Learning, ICML 2019, 2019-June:10691–10700, 2019.
  29. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12347 LNCS:402–419, 2020.
  30. Real-Time Seamless Single Shot 6D Object Pose Prediction. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 292–301, 2018.
  31. A Pose Proposal and Refinement Network for Better 6D Object Pose Estimation. In IEEE Winter Conference on Applications of Computer Vision, pages 2381–2390, 2021.
  32. GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 16611–16621, 2021.
  33. PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes. ArXiv, may 2017.
  34. RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization. ArXiv, 1, 2022.
  35. DPOD: 6D pose object detector and refiner. In Proceedings of the IEEE International Conference on Computer Vision, volume 2019-Octob, pages 1941–1950, feb 2019.
  36. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2019-June, pages 5738–5746, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Alexander Avery (1 paper)
  2. Andreas Savakis (27 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.